In this post, let’s visualize the internals of a transformer model. These visualizations reveal some interesting patterns that can help us understand how well the training is going.
The dis module is a great tool for understanding how code runs. While I mainly use it out of curiosity, it can also be valuable for optimization and debugging. The module allows you to translate your Python code into bytecode—a low-level, intermediate representation of your Python code. By examining bytecode, programmers can glimpse the Python interpreter’s view of their code, shedding light on performance characteristics and operational behaviors that aren’t apparent at the source code level.
YData Profiling used to be know as pandas-profiling, but it’s moved to a new name and new home. I talked about in my post on cleaning DNA splice junction data, but since it was kind of buried in the post and the name has changed, I thought I would do a quick tutorial that only covers YData Profiling. There isn’t much to demo here because it does so much of the work for you, but I’ll still go over it.
This tutorial shows how to plot geospatial data on a map of the US. There are lots of libraries that do all the hard work for you, so the key is just knowing that they exist and how to use them.
In geographical information systems (GIS) it’s important to know how to manipulate geometric data. The best tool for this in Python is Shapely, which provides an extensive set of operations that allow for the sophisticated analysis of spatial data. In this post, I give an introduction to working with Shapely. PostGIS data is usually in the form of WKB or WKT, so we’ll talk about how to work with those data types.
Over the past few decades, government regulations around overhead and satellite imagery have seen significant evolution. Initially strict and limiting, these regulations have gradually been adjusted to reflect technological advancements and market realities. This post delves into the history and recent changes in these regulations, particularly focusing on the U.S. market and suppliers.
Data visualization is an essential tool for data scientists, enabling them to both explore and explain data. It is one of the most important tasks in their toolkit. Despite its significance, I frequently encounter exploratory data analysis (EDA) visualizations that mask important aspects of the data. This causes people to misunderstand their data and make bad decisions based on it. In my experience, the plots that cause this the most are violin and box plots. In this post, I will demonstrate some of the drawbacks of violin plots and box plots and I will suggest some alternative visualizations that offer a clearer representation of data.