# Exploring the World Through Data Science and Machine Learning

# How Violin and Box Plots Obscure Data

Data visualization is an essential tool for data scientists, enabling them to both explore and explain data. It is one of the most important tasks in their toolkit. Despite its significance, I frequently encounter exploratory data analysis (EDA) visualizations that mask important aspects of the data. This causes people to misunderstand their data and make bad decisions based on it. In my experience, the plots that cause this the most are violin and box plots. In this post, I will demonstrate some of the drawbacks of violin plots and box plots and I will suggest some alternative visualizations that offer a clearer representation of data.

# JQL Cheat Sheet

Jira Query Language (JQL) is a powerful tool that allows users to perform advanced searches in Jira. This post contains some of my tips and tricks for working with it.

# Neural Tangent Kernels

Neural Tangent Kernels (NTKs) are an exciting topic in the field of machine learning that combine aspects of neural networks and kernel methods.

# Discovering Latent Knowledge Without Supervision

This post walks through recent work on Discovering Latent Knowledge in Language Models Without Supervision by Burns et al. The paper uses latent knowledge in the model’s activations to train the model. Their method answers yes-no questions accurately by identifying a direction in the activation space that adheres to logical consistency properties, such as having opposite truth values for a statement and its negation.

# Sigmoid Functions for Mathematical Modeling

Sigmoid functions are a type of mathematical function that has a characteristic “S” shape. They are commonly used in mathematical modeling to represent a variety of phenomena, such as the probability of an event occurring, the growth of a population, or the spread of a disease. They naturally exhibit the property of gradual then sudden increase without exploding. I use sigmoids all the time for fitting data. They are smooth and differentiable, as well as being easy to add boundary conditions to. In this post, I provide some tips for how to adapt them to different problem cases.

# SHAP Values on Tabular Data

This post is going to explore SHAP values. SHAP stands for SHapley Additive exPlanations and is a way of explaining the output of a machine learning model.

# Dealing with Skewed Data

In this post I want to talk about some techniques for dealing with skewed data, especially left-skewed data. Left-skewed data is a bit of a rarity. It’s something you don’t see very often, kind of like a left-handed unicorn. It can also be difficult to work with if you’re not prepared.