Exploring the World Through Data Science and Machine Learning

Model Explainability with Grad-CAM in PyTorch

January 01, 2023

This post is a tutorial demonstrating how to use Grad-CAM (Gradient-weighted Class Activation Mapping) for interpreting the output of a neural network. Grad-CAM is a visualization technique that highlights the regions a convolutional neural network (CNN) relied upon most to make predictions. While Grad-CAM is applicable to any CNN, it is predominantly employed with image classification models. This tutorial utilizes PyTorch for implementation, but I made a parallel tutorial that works with TensorFlow.

Siamese Networks with FastAI - Evaluation

September 30, 2022

This post shows how to load and evaluate the model we built in the previous post.

Siamese Networks with FastAI - Update

September 29, 2022

This is post is a walkthrough of creating a Siamese network with FastAI. I had planned to simply use the tutorial from FastAI, but I had to change so much to be able to load the model and make it all work with the latest versions that I figured I would turn it into a blog post. This is really similar to my other post on Siamese Networks with FastAI, except that in this one I will follow on with a post about how to evaluate the model.

Distributions

September 17, 2022

Distributions are super important. In this post, I’ll talk about some common distributions, how to plot them, and what they can be used for.

Generating Github Personal Access Tokens

July 12, 2022

This post contains instructions for how to work with personal access tokens on GitHub.

Test Set Metrics on Unbalanced Datasets: Part II

July 06, 2022

This is Part II of two posts demonstrating how to get test metrics from a highly unbalanced dataset. In Part I, I showed how you could theoretically estimate the precision, recall, and f1 score on highly unbalanced data. In this part, I’ll do the same thing with a real dataset. To do so, I’ll use the Adult Income Dataset.

Test Set Metrics on Unbalanced Datasets: Part I

July 06, 2022

In this post, I’m going to walk through how to solve a problem that you might run into when evaluating models on highly unbalanced datasets. Let’s imagine you’re classifying whether people have a really rare disease or not. You asked 100,000 people at random and only found 10 instances of the disease. How are you going to be able to get enough data to train a machine learning model? Fortunately, you know of a treatment center that treats this specific disease.

4 / 20