Deep Learning – Dogs vs Cats Project

Cat vs Dog Image Classifier Using Deep Learning

Concepts Covered:

built a real AI model that can look at any image and tell you whether it’s a cat or a dog:

  • What deep learning actually is (in plain English)
  • How to use Google Colab for free GPU power
  • How to train your first neural network
  • How to test your model with real images

Prerequisites: Basic Python knowledge (variables, loops, functions). That’s it!

Time Required: 25-35 minutes

Cost: $0 (completely free using Google Colab)


Before We Start: Understanding the Key Concepts

Before diving into code, let’s understand what we’re actually doing. Think of this section as your foundation – skip it, and you’ll be confused later!

What is Deep Learning?

Simple explanation: Deep learning is teaching a computer to recognize patterns by showing it thousands of examples.

Imagine teaching a child what a cat looks like. You don’t explain “cats have pointy ears and whiskers.” Instead, you show them 100 pictures of cats, and their brain learns the pattern automatically. Deep learning works the same way.

Traditional programming vs Deep Learning:

  • Traditional: You write rules (“if it has 4 legs AND pointy ears → cat”)
  • Deep learning: You show examples, the computer creates its own rules

What is a Neural Network?

A neural network is a computer system inspired by how your brain works. Your brain has neurons (brain cells) connected together. When you see a cat, neurons fire in sequence to recognize it.

An artificial neural network mimics this:

  • Input layer: Receives the image (pixel values)
  • Hidden layers: Processes patterns (edges, shapes, textures)
  • Output layer: Makes the final decision (cat or dog?)

What is a CNN (Convolutional Neural Network)?

CNNs are specialized neural networks for images. Here’s why regular neural networks struggle with images:

Problem: A 224×224 pixel color image has 150,528 numbers (224 × 224 × 3 colors). A regular neural network would need millions of connections – too complex!

Solution: CNNs use “convolution” – they scan the image in small patches (like reading a book word by word, not all at once). This dramatically reduces complexity.

What CNNs learn in each layer:

  • Layer 1: Simple edges and lines
  • Layer 2: Shapes and curves
  • Layer 3: Object parts (eyes, ears, fur texture)
  • Layer 4: Whole objects (entire cat face)

What is Transfer Learning? (The Secret Weapon!)

Here’s the breakthrough that makes this project possible:

The Problem: Training a CNN from scratch needs:

  • 100,000+ images
  • Powerful computers
  • Days of training time
  • Expert knowledge

The Solution – Transfer Learning:

Think of it like this: You want to become a chef specializing in Italian food. Do you:

  • Option A: Learn cooking from zero (5 years)
  • Option B: Study with an expert chef who already knows cooking basics, then specialize in Italian cuisine (6 months)

Transfer learning is Option B for AI.

We’ll use MobileNetV2, a neural network that Google already trained on 14 million images from 1000 categories (cars, planes, animals, furniture, etc.). It already knows:

  • What edges look like
  • How to detect shapes
  • What fur, eyes, and ears are

We just teach it the final step: “This combination means cat, that means dog.”

Result: Instead of needing 100,000 images and days of training, we need just 1,000 images and 5 minutes!


Understanding the Modules We’ll Use

Let’s break down every library and why we need it:

1. TensorFlow & Keras

  • What it is: TensorFlow is Google’s deep learning framework. Keras is its user-friendly interface.
  • Why we need it: Builds and trains neural networks
  • Analogy: TensorFlow is the engine, Keras is the steering wheel

2. PIL (Python Imaging Library)

  • What it is: A library for opening, manipulating images
  • Why we need it: To load and display images
  • What it does: Converts image files into arrays of numbers that computers understand

3. Matplotlib

  • What it is: A plotting library (like Excel charts for Python)
  • Why we need it: To visualize images and results
  • What we’ll use it for: Displaying cat/dog images

4. ImageDataGenerator

  • What it is: A Keras tool that feeds images to the neural network
  • Why it’s powerful: Automatically handles batching, shuffling, and splitting data
  • Real benefit: You don’t manually write code to load 1000 images – it does it automatically!

5. MobileNetV2

  • What it is: A pre-trained CNN architecture by Google
  • Why this one: It’s small, fast, and accurate (perfect for beginners)
  • Alternative models: ResNet, VGG16, InceptionV3 (we can discuss these later)

6. NumPy

  • What it is: Python’s numerical computation library
  • Why we need it: Images are stored as arrays of numbers; NumPy handles array math
  • Example: An image is a 224×224×3 array (height × width × color channels)

Step-by-Step Tutorial: Building Your Cat vs Dog Classifier

Now let’s build this! Follow each command in order.


COMMAND 1: Download the Dataset

What we’re doing: Getting 1000 cat and dog images for training

Instructions:

  1. Click this link: https://drive.google.com/drive/folders/1NfvqNLyvT2uBNBYB-9PS97w71RcUNsPB or https://www.kaggle.com/datasets/anthonytherrien/dog-vs-cat
  2. Download the entire folder to your computer
  3. You’ll get two folders: cats (500 images) and dogs (500 images)

Why 1000 images? More is better, but 500 per category is the minimum for decent accuracy with transfer learning.


COMMAND 2: Set Up Google Colab

What is Google Colab?

  • A free cloud-based Jupyter notebook
  • Gives you free access to GPUs (graphics cards that train AI 10x faster)
  • No installation needed – runs in your browser!

Steps:

  1. Go to https://colab.research.google.com/
  2. Click “New Notebook”
  3. You’ll see a blank coding environment

Pro tip: Colab automatically disconnects after 90 minutes of inactivity. Don’t worry – your code is saved!


COMMAND 3: Upload Dataset to Colab

What we’re doing: Creating folders and uploading images

Steps:

  1. Click the folder icon on the left sidebar (Files panel)
  2. Right-click in the file area → New folder → Name it animals
  3. Inside animals, create two folders: cats and dogs
  4. Upload images:
    • Click on the cats folder → Upload button → Select all cat images
    • Click on the dogs folder → Upload button → Select all dog images

Wait for upload to complete! You’ll see a progress indicator. 1000 images take 2-5 minutes depending on your internet speed.

Why this folder structure?

animals/
├── cats/
│   ├── cat1.jpg
│   ├── cat2.jpg
│   └── ...
└── dogs/
    ├── dog1.jpg
    ├── dog2.jpg
    └── ...

ImageDataGenerator reads this structure automatically and knows:

  • Everything in cats/ folder → label = “cat”
  • Everything in dogs/ folder → label = “dog”

COMMAND 4: Verify Upload

What we’re doing: Checking that files uploaded correctly

Code to run:

!ls

What this means:

  • ! tells Colab to run a terminal command (not Python)
  • ls = “list” (shows all files and folders)

Expected output:

animals  sample_data

Explanation: You should see animals folder. sample_data is a default Colab folder (ignore it).


COMMAND 5: Verify Folder Contents

Code to run:

!ls animals

Expected output:

cats  dogs

What this confirms: Both subfolders exist inside animals


COMMAND 6: Display a Sample Image (Sanity Check)

What we’re doing: Making sure images are readable and correctly uploaded

Code to run:

from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("/content/animals/cats/00000-4122619873.png")
plt.imshow(img)
plt.axis("off")

Line-by-line breakdown:

Line 1-2: Import libraries

  • from PIL import Image → Imports the Image class from PIL library
  • import matplotlib.pyplot as plt → Imports plotting functions

Line 4: img = Image.open("/content/animals/cats/00000-4122619873.png")

  • Image.open() → Opens the image file
  • /content/ → Default Colab working directory
  • 00000-4122619873.png → Replace with any filename from your cats folder

Line 5: plt.imshow(img)

  • imshow = “image show”
  • Displays the image

Line 6: plt.axis("off")

  • Hides the x and y axis numbers (makes it cleaner)

Expected output: You should see a cat image displayed!

Troubleshooting:

  • Error: “No such file” → Check your filename exactly matches
  • No image appears → Run plt.show() after the code

COMMAND 7: List Sample Files

What we’re doing: Viewing the first 10 files in each folder to verify variety

Code to run:

!ls animals/cats | head
!ls animals/dogs | head

What this means:

  • !ls animals/cats → List all files in cats folder
  • | head → Show only the first 10 (otherwise it’d show all 500!)

Expected output:

00000-4122619873.png
00001-2847563902.png
00002-1928374650.png
...
(10 files total)

Why this matters: Confirms you have multiple images, not just one test file.


COMMAND 8: Import ImageDataGenerator and Prepare Data

What we’re doing: Setting up the data pipeline that feeds images to our neural network

Code to run:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os

# Remove unwanted .ipynb_checkpoints directory if it exists
if os.path.exists("/content/animals/.ipynb_checkpoints"):
    !rm -rf "/content/animals/.ipynb_checkpoints"
    print("Removed .ipynb_checkpoints directory.")

IMG_SIZE = (224, 224)
BATCH_SIZE = 32

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

train_data = datagen.flow_from_directory(
    "/content/animals",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="binary",
    subset="training"
)

val_data = datagen.flow_from_directory(
    "/content/animals",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="binary",
    subset="validation"
)

DETAILED BREAKDOWN – READ CAREFULLY!

Lines 1-2: Import necessary modules

  • ImageDataGenerator → The tool that handles image loading
  • os → Operating system module (to check if folders exist)

Lines 4-7: Clean up hidden files

  • Jupyter creates hidden .ipynb_checkpoints folders that confuse ImageDataGenerator
  • os.path.exists() → Checks if the folder exists
  • !rm -rf → Removes the folder forcefully (if found)
  • This prevents the error: “Found 3 classes instead of 2”

Line 9: IMG_SIZE = (224, 224)

  • All images must be the same size for neural networks
  • 224×224 pixels is the standard for MobileNetV2
  • Your original images might be 800×600, 1920×1080, etc. – they’ll auto-resize

Line 10: BATCH_SIZE = 32

  • What is a batch? Instead of feeding 1 image at a time, we feed 32 together
  • Why? GPUs process multiple images in parallel (faster training)
  • Analogy: Washing dishes one-by-one vs loading the dishwasher

Lines 12-15: Create the ImageDataGenerator

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

Parameter explanations:

rescale=1./255 – SUPER IMPORTANT!

  • Images are stored as pixels with values 0-255 (black to white)
  • Neural networks work best with values 0-1 (normalized)
  • 1./255 divides every pixel by 255 → converts 0-255 to 0-1
  • Example: Pixel value 127 (gray) becomes 127/255 = 0.498

validation_split=0.2

  • Splits data into 80% training, 20% validation
  • Training data: Used to teach the model
  • Validation data: Used to test if it learned correctly
  • Why split? Prevents “memorization” – we want the model to generalize, not just remember training images

Lines 17-23: Create training data generator

train_data = datagen.flow_from_directory(
    "/content/animals",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="binary",
    subset="training"
)

What flow_from_directory does: Automatically:

  1. Scans the /content/animals folder
  2. Finds subfolders (cats, dogs)
  3. Labels images based on folder name
  4. Resizes all images to 224×224
  5. Creates batches of 32 images
  6. Shuffles images randomly (prevents learning order bias)

Parameter breakdown:

"/content/animals" → Path to parent folder containing class subfolders

target_size=IMG_SIZE → Resize all images to (224, 224)

batch_size=BATCH_SIZE → Load 32 images at a time

class_mode="binary" → We have 2 classes (binary classification)

  • Binary = Cat (0) or Dog (1)
  • Alternative: categorical for 3+ classes (cat, dog, bird)

subset="training" → Use the 80% split for training

Lines 25-31: Create validation data generator

  • Same code, but subset="validation" uses the 20% split
  • This is the “test” data we use to check accuracy

Expected output:

Found 800 images belonging to 2 classes.
Found 200 images belonging to 2 classes.

What this means:

  • 800 images in training set (80% of 1000)
  • 200 images in validation set (20% of 1000)
  • 2 classes detected: cats and dogs

Common error fix: If it says “Found 3 classes” – the .ipynb_checkpoints cleanup didn’t work. Manually delete that folder from the file panel.


COMMAND 9: Import Deep Learning Libraries

Code to run:

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models

What we’re importing:

MobileNetV2 → The pre-trained model

  • Already trained on 14 million images
  • Knows 1000 categories (dogs, cats, cars, planes, etc.)
  • We’ll fine-tune it for our specific task

layers → Building blocks of neural networks

  • Dense → Fully connected layer
  • GlobalAveragePooling2D → Compresses image features

models → Framework to combine layers

  • Sequential → Stack layers in sequence (like Lego blocks)

Expected output: Nothing! If no error appears, the import worked.


COMMAND 10: Load Pre-trained MobileNetV2 Model

What we’re doing: Loading Google’s pre-trained model as our foundation

Code to run:

base_model = MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights="imagenet"
)

base_model.trainable = False

LINE-BY-LINE BREAKDOWN:

Line 1: base_model = MobileNetV2(...)

  • Creates the MobileNetV2 neural network
  • Stores it in the variable base_model

Parameter: input_shape=(224, 224, 3)

  • Tells the model to expect 224×224 pixel images
  • 3 = RGB color channels (Red, Green, Blue)
  • Why these numbers? MobileNetV2 was designed for this size

Parameter: include_top=False – CRITICAL!

  • “Top” = The final classification layer
  • The original model classifies 1000 categories
  • We remove it because we only need 2 categories (cat vs dog)
  • Analogy: Using a Swiss Army knife but removing the corkscrew (we don’t need it)

Parameter: weights="imagenet"

  • Loads the pre-trained weights (learned patterns)
  • ImageNet = Dataset of 14 million images
  • Without this: Random starting point (would need 100k images to train)
  • With this: Starts with expert knowledge (only needs 1k images to fine-tune)

Line 6: base_model.trainable = False

  • Freezes the pre-trained layers
  • Means: “Don’t change what you already learned”
  • Why? MobileNetV2 already knows edges, shapes, textures – we keep that knowledge
  • We’ll only train the new layers we add next

What happens behind the scenes:

  1. Downloads MobileNetV2 weights (~14 MB file)
  2. Loads 88 layers of neural network
  3. Sets all 88 layers to “frozen” mode

COMMAND 11: Add Custom Classification Layers

What we’re doing: Adding our own “brain” on top of MobileNetV2 to make cat vs dog decisions

Code to run:

model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(1, activation="sigmoid")
])

LAYER-BY-LAYER EXPLANATION:

Line 1: model = models.Sequential([...])

  • Sequential = Stack layers in order (like a pipeline)
  • Data flows: Input → Layer 1 → Layer 2 → Layer 3 → Output

Layer 1: base_model

  • The frozen MobileNetV2 (88 layers)
  • Output: A 7×7×1280 tensor (compressed image features)
  • Think of it as: “Here are 1280 features I detected in this image”

Layer 2: layers.GlobalAveragePooling2D()

  • What it does: Compresses the 7×7×1280 tensor into a 1×1280 vector
  • How: Takes the average of each 7×7 grid
  • Why: Neural networks need fixed-size input; this standardizes it
  • Analogy: Reading a book chapter and writing 1280 key points

Layer 3: layers.Dense(1, activation="sigmoid")

  • Dense = Fully connected layer (every input connects to every output)
  • 1 = One output neuron
  • Why 1? Binary classification needs 1 output:
    • Output close to 0 → Cat
    • Output close to 1 → Dog
  • activation="sigmoid" → Squashes output to 0-1 range (probability)

The sigmoid function:

Input: any number (-∞ to +∞)
Output: number between 0 and 1
Example: sigmoid(-5) = 0.007 (almost 0 → cat)
         sigmoid(+5) = 0.993 (almost 1 → dog)

Full data flow:

Input image (224×224×3) 
    ↓
MobileNetV2 (extracts 1280 features)
    ↓
GlobalAveragePooling (compresses to 1280 numbers)
    ↓
Dense layer (makes decision: 0-1)
    ↓
Output: 0.12 → Cat! (88% confident)
        0.87 → Dog! (87% confident)

COMMAND 12: Compile the Model

What we’re doing: Configuring HOW the model will learn

Code to run:

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

PARAMETER EXPLANATIONS:

optimizer="adam"

  • What’s an optimizer? The algorithm that adjusts the neural network’s weights
  • Why Adam? It’s the most popular – smart, fast, works well for most cases
  • Alternatives: SGD (slower but sometimes better), RMSprop
  • Analogy: Adam is like cruise control – automatically adjusts speed for optimal performance

Technical detail (optional): Adam stands for “Adaptive Moment Estimation” – it adapts the learning rate for each parameter automatically.

loss="binary_crossentropy"

  • What’s a loss function? Measures how “wrong” the model’s predictions are
  • Binary crossentropy = Standard loss for 2-class classification
  • How it works:
    • Prediction: 0.9 (dog), Actual: dog → Low loss (good!)
    • Prediction: 0.1 (cat), Actual: dog → High loss (bad!)
  • Goal: Training minimizes this loss

Why “crossentropy”? It comes from information theory – measures the difference between two probability distributions.

metrics=["accuracy"]

  • Tracks accuracy during training
  • Accuracy = Percentage of correct predictions
  • Example: 80/100 correct = 80% accuracy
  • This is just for monitoring – doesn’t affect training

What compilation does:

  1. Prepares the computational graph
  2. Allocates memory on GPU
  3. Sets up the optimization algorithm
  4. Ready to train!

COMMAND 13: Train the Model (The Magic Happens!)

What we’re doing: Actually teaching the model to recognize cats vs dogs

Code to run:

history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=5
)

PARAMETER BREAKDOWN:

train_data

  • The training images (800 images, 80% of dataset)
  • Model learns from these

validation_data=val_data

  • The validation images (200 images, 20% of dataset)
  • Tests the model after each epoch
  • Crucial: Detects if model is just memorizing vs actually learning

epochs=5

  • What’s an epoch? One complete pass through all training images
  • 5 epochs means: The model sees all 800 images 5 times
  • Why 5? Balance between:
    • Too few (2-3) → Underfitting (doesn’t learn enough)
    • Too many (20+) → Overfitting (memorizes instead of learning)

What happens during training:

Epoch 1:

  1. Shows 800 images to the model (in batches of 32)
  2. Model makes predictions
  3. Calculates loss (how wrong it was)
  4. Adjusts weights to reduce loss
  5. Tests on 200 validation images
  6. Reports accuracy

Epoch 2-5: Repeats the process

Expected output:

Epoch 1/5
25/25 [==============================] - 45s 2s/step - loss: 0.4521 - accuracy: 0.7875 - val_loss: 0.2134 - val_accuracy: 0.9150
Epoch 2/5
25/25 [==============================] - 42s 2s/step - loss: 0.2108 - accuracy: 0.9125 - val_loss: 0.1456 - val_accuracy: 0.9450
Epoch 3/5
25/25 [==============================] - 41s 2s/step - loss: 0.1523 - accuracy: 0.9375 - val_loss: 0.1123 - val_accuracy: 0.9550
Epoch 4/5
25/25 [==============================] - 40s 2s/step - loss: 0.1234 - accuracy: 0.9500 - val_loss: 0.0987 - val_accuracy: 0.9650
Epoch 5/5
25/25 [==============================] - 39s 2s/step - loss: 0.1087 - accuracy: 0.9587 - val_loss: 0.0892 - val_accuracy: 0.9700

UNDERSTANDING THE OUTPUT:

25/25 → Total batches (800 images ÷ 32 per batch = 25)

45s 2s/step → Total time and time per batch

loss: 0.4521 → Training loss (decreases each epoch = good!)

accuracy: 0.7875 → Training accuracy (78.75%)

val_loss: 0.2134 → Validation loss

val_accuracy: 0.9150Validation accuracy (91.5%) ← THIS IS KEY!

WHAT TO LOOK FOR:

Good signs:

  • Accuracy increasing each epoch
  • Loss decreasing each epoch
  • Validation accuracy close to training accuracy (within 5-10%)

⚠️ Warning signs:

  • Overfitting: Training accuracy 98%, validation 70% (model memorized training data)
  • Underfitting: Both accuracies stuck at 60% (model isn’t learning)

Why validation accuracy > training accuracy here?

  • We froze MobileNetV2 layers (they don’t change)
  • Only training 1 layer (the final Dense layer)
  • Sometimes validation gets lucky with easier examples

Training time: 3-4 minutes on Colab’s free GPU (vs 30+ minutes on CPU)


COMMAND 14: View Model Architecture

What we’re doing: Seeing the complete structure of our neural network

Code to run:

model.summary()

Expected output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
mobilenetv2_1.00_224 (Functional) (None, 7, 7, 1280)   2,257,984 
_________________________________________________________________
global_average_pooling2d (GlobalAveragePooling2D) (None, 1280) 0         
_________________________________________________________________
dense (Dense)                (None, 1)                 1,281     
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________

WHAT THIS MEANS:

Layer 1: MobileNetV2

  • Output Shape:(None, 7, 7, 1280)
    • None = batch size (variable)
    • 7×7 = spatial dimensions (compressed from 224×224)
    • 1280 = feature maps (detectors for different patterns)
  • Params: 2,257,984 parameters (weights)
  • Status: Non-trainable (frozen)

Layer 2: GlobalAveragePooling2D

  • Output Shape: (None, 1280) – flattened to a vector
  • Params: 0 (it’s just averaging, no learnable weights)

Layer 3: Dense

  • Output Shape: (None, 1) – single output (cat vs dog)
  • Params: 1,281
    • Why 1,281? (1280 inputs × 1 output) + 1 bias = 1,281

Key insight:

  • Total params: 2,259,265
  • Trainable: Only 1,281 (0.05%!)
  • Non-trainable: 2,257,984 (frozen MobileNetV2)

This is the power of transfer learning – we only train 0.05% of the network but get 95%+ accuracy!


COMMAND 15: Save the Model (.h5 format)

What we’re doing: Saving your trained model so you don’t lose it when Colab disconnects

Code to run:

model.save("/content/dog_cat_model.h5")

What happens:

  • Creates a file dog_cat_model.h5 (about 9 MB)
  • Contains the entire model (architecture + weights)
  • .h5 = HDF5 format (older Keras format)

You’ll see this warning:

WARNING:absl:You are saving your model as an HDF5 file via `model.save()`. 
This file format is considered legacy. We recommend using instead the native 
Keras format, e.g. `model.save('my_model.keras')`.

What this means: .h5 still works but is outdated. Let’s use the new format!


COMMAND 16: Save the Model (.keras format – RECOMMENDED)

Code to run:

model.save("/content/dog_cat_model.keras")

Why .keras is better:

  • Modern format (future-proof)
  • Faster loading
  • Better compression
  • Official Keras recommendation

To download the model:

  1. Click the folder icon (left sidebar)
  2. Find dog_cat_model.keras
  3. Click the 3 dots → Download
  4. Save it on your computer!

Now you can:

  • Share your model with friends
  • Load it later without retraining
  • Use it in a web app or mobile app

COMMAND 17: Create Upload Button for Testing

What we’re doing: Adding a button to upload test images from your computer

Code to run:

from google.colab import files

uploaded = files.upload()

What this does:

  1. Creates a “Choose Files” button
  2. Click it and select any cat or dog image from your computer
  3. Uploads the image to Colab
  4. Stores the filename in the uploaded variable

Expected behavior:

  • You’ll see a file picker dialog
  • Select an image (JPG, PNG, etc.)
  • Wait for upload (shows progress bar)
  • When done, the cell completes

Pro tip: Test with images NOT from your training set! Use:

  • Photos from Google Images
  • Your own pet photos
  • Random internet images

Why? This tests if the model generalizes (works on new, unseen images).


COMMAND 18: Make Predictions on Your Test Image

What we’re doing: The moment of truth – testing if your AI actually works!

Code to run:

python

import numpy as np
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt

IMG_SIZE = (224, 224)

img_path = list(uploaded.keys())[0]   # automatically gets file name

img = image.load_img(img_path, target_size=IMG_SIZE)
img_array = image.img_to_array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)

prediction = model.predict(img_array)

plt.imshow(img)
plt.axis("off")

if prediction[0][0] > 0.5:
    print("Prediction: Dog 🐶")
else:
    print("Prediction: Cat 🐱")

COMPLETE LINE-BY-LINE BREAKDOWN:

Lines 1-3: Import necessary libraries

python

import numpy as np
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt
  • numpy → For array operations
  • image → Keras image utilities for loading/processing
  • matplotlib → For displaying the image

Line 5: IMG_SIZE = (224, 224)

  • Same size we used for training
  • Critical: Model expects 224×224 images

Line 7: img_path = list(uploaded.keys())[0]

  • uploaded is a dictionary: {'filename.jpg': file_data}
  • .keys() gets the filename
  • [0] gets the first (and only) uploaded file
  • Result: Stores the filename string (e.g., “my_dog.jpg”)

Line 9: img = image.load_img(img_path, target_size=IMG_SIZE)

  • Loads the image from the file
  • target_size=IMG_SIZE → Automatically resizes to 224×224
  • Why resize? Your image might be 4000×3000 or 800×600 – model needs exactly 224×224

Line 10: img_array = image.img_to_array(img) / 255.0

  • image.img_to_array(img) → Converts image to NumPy array
    • Result: Shape (224, 224, 3) – height × width × RGB channels
    • Values: 0-255 (raw pixel values)
  • / 255.0 → Normalizes to 0-1 range
    • Why? Remember, we trained with normalized images (rescale=1./255)
    • Must match training preprocessing!

Line 11: img_array = np.expand_dims(img_array, axis=0)

  • Critical transformation!
  • Before: Shape (224, 224, 3) – single image
  • After: Shape (1, 224, 224, 3) – batch of 1 image
  • Why? Model expects batches, even if batch size = 1
  • axis=0 adds a new dimension at the front

Visual representation:

Original: [[[R, G, B], [R, G, B], ...]]  ← 2D grid of pixels
After:    [[[[R, G, B], [R, G, B], ...]]] ← Batch containing 1 image

Line 13: prediction = model.predict(img_array)

  • THE PREDICTION HAPPENS HERE!
  • Model processes the image through all layers
  • Returns: Array of probabilities
  • Shape: [[0.87]] (2D array with one value)

How prediction works internally:

  1. Image goes through MobileNetV2 (extracts 1280 features)
  2. GlobalAveragePooling compresses features
  3. Dense layer with sigmoid outputs probability
  4. Output: Number between 0 and 1

Line 15-16: Display the image

python

plt.imshow(img)
plt.axis("off")
  • Shows the uploaded image
  • Turns off axis numbers for cleaner display

Lines 18-21: Interpret the prediction

python

if prediction[0][0] > 0.5:
    print("Prediction: Dog 🐶")
else:
    print("Prediction: Cat 🐱")

Understanding prediction[0][0]:

  • prediction = [[0.87]] (2D array)
  • prediction[0] = [0.87] (first element of outer array)
  • prediction[0][0] = 0.87 (actual probability value)

Decision logic:

  • If > 0.5: More likely a dog (closer to 1)
  • If < 0.5: More likely a cat (closer to 0)
  • Threshold 0.5 is standard for binary classification

Example outputs:

prediction[0][0] = 0.92 → "Prediction: Dog 🐶" (92% confident)
prediction[0][0] = 0.13 → "Prediction: Cat 🐱" (87% confident it's NOT a dog)
prediction[0][0] = 0.51 → "Prediction: Dog 🐶" (barely, only 51% confident)

Expected full output:

  • Image of your uploaded photo
  • Text: “Prediction: Dog 🐶” or “Prediction: Cat 🐱”

Complete Command Summary with Correct Order

Here’s the complete numbered sequence of all commands:

SETUP PHASE:

  1. Download dataset from Google Drive link
  2. Open Google Colab and create new notebook
  3. Create folder structure and upload images to Colab

VERIFICATION PHASE: 4. !ls – Verify main folders 5. !ls animals – Verify subfolder structure 6. Display sample image with PIL and matplotlib 7. !ls animals/cats | head and !ls animals/dogs | head – List sample files

DATA PREPARATION PHASE: 8. Import ImageDataGenerator and prepare train/validation data splits 9. Import TensorFlow/Keras libraries (MobileNetV2, layers, models)

MODEL BUILDING PHASE: 10. Load pre-trained MobileNetV2 base model 11. Add custom classification layers on top 12. Compile the model with optimizer, loss, and metrics

TRAINING PHASE: 13. Train the model with model.fit() 14. View model architecture with model.summary()

SAVING PHASE: 15. Save model as .h5 file (legacy format) 16. Save model as .keras file (recommended format)

TESTING PHASE: 17. Create upload button for test images 18. Make predictions on uploaded images


What You’ve Accomplished! 🎉

Congratulations! You’ve just built a real AI model from scratch. Let’s recap what you’ve learned:

Technical Skills Gained:

✅ Set up a cloud-based GPU environment (Google Colab)
✅ Prepared image datasets with proper folder structure
✅ Used ImageDataGenerator for automated data preprocessing
✅ Implemented transfer learning with MobileNetV2
✅ Built a custom neural network architecture
✅ Trained a deep learning model with validation
✅ Saved and loaded trained models
✅ Made predictions on new images

Key Concepts Mastered:

✅ What deep learning and neural networks are
✅ How CNNs process images differently than regular neural networks
✅ The power of transfer learning (reusing pre-trained models)
✅ Training vs validation data and why we split them
✅ Image normalization and preprocessing
✅ Binary classification with sigmoid activation
✅ Model compilation (optimizer, loss, metrics)

Real-World Performance:

  • Accuracy: 80-97% (depending on your dataset quality)
  • Training time: 3-5 minutes on free GPU
  • Model size: ~9 MB (portable and shareable)
  • Images needed: Only 1,000 (vs 100,000+ from scratch)

Troubleshooting Common Issues

Issue 1: “Found 3 classes instead of 2”

Cause: Hidden .ipynb_checkpoints folder in animals directory Fix: Run this in a code cell:

python

!rm -rf /content/animals/.ipynb_checkpoints

Issue 2: Low accuracy (below 70%)

Possible causes:

  • Poor quality images (blurry, wrong labels, duplicates)
  • Too few epochs (try 7-10 instead of 5)
  • Dataset imbalance (e.g., 700 cats, 300 dogs)

Fix: Check your dataset quality manually

Issue 3: “ResourceExhausted” error

Cause: GPU ran out of memory Fix: Reduce batch size:

python

BATCH_SIZE = 16  # instead of 32

Issue 4: Model predicts same class for everything

Cause: Model didn’t learn properly (overfitting or underfitting) Fix:

  • Check if images uploaded correctly
  • Increase training epochs to 10
  • Verify rescale=1./255 is applied

Issue 5: Upload button doesn’t appear

Cause: Code didn’t run completely Fix: Run the cell again and wait for “Choose Files” button


Next Steps: Level Up Your AI Skills

Beginner Challenges:

  1. Improve accuracy: Try training for 10 epochs instead of 5
  2. Add confidence scores: Print the exact probability (e.g., “Dog: 87% confident”)
  3. Test multiple images: Modify code to upload and predict 5 images at once
  4. Visualize training: Plot accuracy curves using matplotlib

Intermediate Challenges:

  1. Add data augmentation: Flip, rotate, zoom images during training for better generalization
  2. Try different architectures: Replace MobileNetV2 with ResNet50 or VGG16
  3. Multi-class classification: Add a third category (e.g., cats, dogs, birds)
  4. Fine-tune frozen layers: Unfreeze last 10 layers of MobileNetV2 for higher accuracy

Advanced Projects:

  1. Build a web app: Use Streamlit or Gradio to create an interface
  2. Deploy to mobile: Convert model to TensorFlow Lite for Android/iOS
  3. Real-time video classification: Process webcam feed frame-by-frame
  4. Create your own dataset: Scrape images from the web and build a custom classifier

Understanding Your Results Better

What does 91% accuracy actually mean?

  • Out of 100 test images, model correctly identifies 91
  • 9 images are misclassified (false positives/negatives)
  • Is 91% good? Yes! Professional models are 95-98%, but they use:
    • 100,000+ images
    • Advanced architectures
    • Days of training

Why validation accuracy fluctuates?

You might see:

Epoch 1: 91%
Epoch 2: 94%
Epoch 3: 92%

Reasons:

  • Validation set is small (200 images) – random variation matters
  • Some batches are naturally harder than others
  • Model is still learning and adjusting

What to watch: The overall trend (should increase or stay stable)

When is your model actually good?

✅ Validation accuracy within 5% of training accuracy
✅ Both accuracies above 80%
✅ Model correctly predicts YOUR test images (not just training data)


The Science Behind Transfer Learning

Why MobileNetV2 works so well:

What it learned from ImageNet:

  • Low-level features (Layer 1-20): Edges, corners, colors, textures
  • Mid-level features (Layer 21-50): Shapes, patterns, object parts
  • High-level features (Layer 51-88): Complex objects, scenes

Universal knowledge: These patterns are universal! Whether identifying cats, cars, or planes, you need to detect edges and shapes first.

Our specialization: We only train the final layer to combine these universal features specifically for cat vs dog detection.

Analogy:

  • From scratch: Teaching someone to read, write, and then become a lawyer (10 years)
  • Transfer learning: Hiring a college graduate and training them in law (2 years)

Real-World Applications of This Technique

Your cat vs dog classifier uses the same technology behind:

  1. Medical imaging: Detecting tumors in X-rays and MRIs
  2. Self-driving cars: Identifying pedestrians, cars, traffic signs
  3. Quality control: Spotting defective products in manufacturing
  4. Wildlife conservation: Counting endangered species from camera traps
  5. Agriculture: Detecting plant diseases from leaf photos
  6. Security: Facial recognition systems
  7. Retail: Visual search (“find similar products”)

The skill you learned is in-demand! Companies pay $80,000-$150,000/year for computer vision engineers.


Final Thoughts

You’ve completed a journey that would have seemed impossible just a few years ago. In 2012, training an image classifier required:

  • PhD-level knowledge
  • $10,000+ in hardware
  • Weeks of training time
  • 1,000,000+ images

Today, you did it with:

  • Basic Python knowledge
  • Free cloud resources
  • 5 minutes of training
  • 1,000 images

This is the democratization of AI in action.

The concepts you learned here – CNNs, transfer learning, data preprocessing, model training – are the foundation of modern computer vision. Whether you’re building a startup, pursuing a career in AI, or just exploring as a hobby, you now have real, practical skills.

Remember: Every expert was once a beginner. The model you built today might seem simple, but it’s using the same principles as models that:

  • Diagnose cancer
  • Power autonomous vehicles
  • Translate languages in real-time

Keep building, keep learning, and most importantly – have fun with AI! 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *