Exploring the world through data science

# Principal Component Analysis

This post is an introduction to principal component analysis (PCA) for the NOVA Deep Learning Meetup.

# Exploring Decision Trees in R

This post aims to explore decision trees for the NOVA Deep Learning Meetup. It is based on chapter 8 of An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. It also discusses methods to improve decision tree performance, such as bagging, random forest, and boosting. There are two posts with the same material, one in R and one in Python.

# Exploring Decision Trees in Python

This post aims to explore decision trees for the NOVA Deep Learning Meetup. It is based on chapter 8 of An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. It also discusses methods to improve decision tree performance, such as bagging, random forest, and boosting. There are two posts with the same material, one in R and one in Python.

# DNA Splice Junctions II: Logistic Regression from Scratch

Now that we’ve cleaned and prepared the data, let’s try classifying it using logistic regression. Logistic regression is a popular machine learning algorithm for classification due to its speed and accuracy relative to its simplicity.

# DNA Splice Junctions I: Cleaning and Preparing the Data

Splice junctions are locations on strings of DNA or RNA where superfluous sections are removed when proteins are created. After the splice, a section, known as the intron, is removed and the remaining sections, known as the exons, are joined together. Being able to identify these sequences of DNA is useful but time-consuming. This begs the question: Can spliced sections of DNA be determined with machine learning?

# Kangaroos and Wallabies III: Classifying the Data

In this notebook, we’re going to take our augmented dataset and build a convolutional neural network to classify the images.