Exploring the World Through Data Science and Machine Learning

A Guide to the Python Disassembler Module

July 24, 2024

The dis module is a great tool for understanding how code runs. While I mainly use it out of curiosity, it can also be valuable for optimization and debugging. The module allows you to translate your Python code into bytecode—a low-level, intermediate representation of your Python code. By examining bytecode, programmers can glimpse the Python interpreter’s view of their code, shedding light on performance characteristics and operational behaviors that aren’t apparent at the source code level.

YData Profiling Tutorial

April 03, 2024

YData Profiling used to be know as pandas-profiling, but it’s moved to a new name and new home. I talked about in my post on cleaning DNA splice junction data, but since it was kind of buried in the post and the name has changed, I thought I would do a quick tutorial that only covers YData Profiling. There isn’t much to demo here because it does so much of the work for you, but I’ll still go over it.

Geospatial Data Plotting Tutorial

April 03, 2024

This tutorial shows how to plot geospatial data on a map of the US. There are lots of libraries that do all the hard work for you, so the key is just knowing that they exist and how to use them.

Shapely Tutorial

January 14, 2024

In geographical information systems (GIS) it’s important to know how to manipulate geometric data. The best tool for this in Python is Shapely, which provides an extensive set of operations that allow for the sophisticated analysis of spatial data. In this post, I give an introduction to working with Shapely. PostGIS data is usually in the form of WKB or WKT, so we’ll talk about how to work with those data types.

Overhead Imagery Restrictions

November 30, 2023

Over the past few decades, government regulations around overhead and satellite imagery have seen significant evolution. Initially strict and limiting, these regulations have gradually been adjusted to reflect technological advancements and market realities. This post delves into the history and recent changes in these regulations, particularly focusing on the U.S. market and suppliers.

How Violin and Box Plots Obscure Data

August 27, 2023

Data visualization is an essential tool for data scientists, enabling them to both explore and explain data. It is one of the most important tasks in their toolkit. Despite its significance, I frequently encounter exploratory data analysis (EDA) visualizations that mask important aspects of the data. This causes people to misunderstand their data and make bad decisions based on it. In my experience, the plots that cause this the most are violin and box plots. In this post, I will demonstrate some of the drawbacks of violin plots and box plots and I will suggest some alternative visualizations that offer a clearer representation of data.

JQL Cheat Sheet

August 04, 2023

Jira Query Language (JQL) is a powerful tool that allows users to perform advanced searches in Jira. This post contains some of my tips and tricks for working with it.

2 / 20