Julius’ Data Science Blog

A Guide to the Python Disassembler Module

2024-07-24T00:00:00+00:00

The dis module is a great tool for understanding how code runs. While I mainly use it out of curiosity, it can also be valuable for optimization and debugging. The module allows you to translate your Python code into bytecode—a low-level, intermediate representation of your Python code. By examining bytecode, programmers can glimpse the Python interpreter’s view of their code, shedding light on performance characteristics and operational behaviors that aren’t apparent at the source code level.

In this post, we’ll look into the dis module. We’ll start by understanding what Python bytecode is and why it matters. Then, we’ll dive into the basics of using the dis module, gradually advancing to its more intricate applications.

Table of Contents

Python Bytecode
- Bytecode vs. Source Code vs. Machine Code
Getting Started with the dis Module
- Understanding the Disassembly Output
- Working with classes
Advanced Usage of the dis Module

Python Bytecode

First, let’s understand what Python bytecode is. Bytecode is an intermediate, low-level representation of your Python code, generated by the Python interpreter. Unlike machine code, bytecode is not directly executed by the hardware but by the Python Virtual Machine (PVM). This layer of abstraction allows Python to maintain its platform independence, as the PVM takes care of translating bytecode into machine-specific instructions.

Bytecode vs. Source Code vs. Machine Code

To appreciate the significance of bytecode, it’s important to distinguish it from source code and machine code:

Source Code: This is the code you write in Python, characterized by its readability and high-level syntax. It’s the starting point of the execution process.
Bytecode: When you run a Python program, the interpreter first compiles the source code into bytecode. This compilation happens automatically and is a step towards execution. Bytecode is more abstract than machine code and less readable than source code.
Machine Code: The final step in the execution process is the translation of bytecode into machine code by the PVM. Machine code is a set of instructions executed directly by the computer’s CPU.

Getting Started with the dis Module

Now let’s use the dis module. It’s part of Python’s standard library, so you don’t need any additional installations to start using it. To begin, simply import the module into your Python script:

import dis

The core function in the dis module is dis.dis(), which is used to disassemble Python functions, methods, and code objects. Here’s a simple example:

def doubler(x):
    return x * 2

dis.dis(doubler)

  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (x)
              4 LOAD_CONST               1 (2)
              6 BINARY_OP                5 (*)
             10 RETURN_VALUE

This code will output the disassembled bytecode of example_function. Let’s talk about what all this means.

Understanding the Disassembly Output

The output of dis.dis() typically includes the following columns:

Line number: Indicates the line number in your source code.
Byte offset: The position of the instruction in the bytecode sequence.
Operation name: The human-readable name of the operation (e.g., LOAD_FAST, BINARY_MULTIPLY, etc.).
Argument: Additional data needed for some operations (e.g., variable names, constants).
Argument details: (in parentheses) Further explanation of the argument, such as variable names or constant values.

Let’s look at our example. Each line corresponds to an instruction in the bytecode:

LOAD_FAST loads the argument x onto the stack.
LOAD_CONST loads the constant 2.
BINARY_MULTIPLY multiplies the two topmost items on the stack.
RETURN_VALUE returns the result.

Working with classes

The dis module can also disassemble methods within classes:

class MyClass:
    def add_one(self, x):
        return x + 1

dis.dis(MyClass.add_one)

  2           0 RESUME                   0

  3           2 LOAD_FAST                1 (x)
              4 LOAD_CONST               1 (1)
              6 BINARY_OP                0 (+)
             10 RETURN_VALUE

Advanced Usage of the `dis` Module

Now let’s look at the more advanced capabilities of the dis module.

Exploring the Bytecode Object with `dis.Bytecode`

For more detailed analysis, the dis.Bytecode class offers a richer interface. It provides an iterator over the individual instructions in the bytecode:

for instruction in dis.Bytecode(MyClass.add_one):
    print(instruction.opname, instruction.argval)

RESUME 0
LOAD_FAST x
LOAD_CONST 1
BINARY_OP 0
RETURN_VALUE None

This approach allows you to examine each operation in more detail and is particularly helpful for processing or analyzing the bytecode programmatically.

Control Structures in Bytecode

You can also use dis to analyze control structures like loops and conditionals. Here’s a simple for-loop:

def for_loop_example():
    for i in range(3):
        print(i)

dis.dis(for_loop_example)

  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + range)
             12 LOAD_CONST               1 (3)
             14 CALL                     1
             22 GET_ITER
        >>   24 FOR_ITER                13 (to 54)
             28 STORE_FAST               0 (i)

  3          30 LOAD_GLOBAL              3 (NULL + print)
             40 LOAD_FAST                0 (i)
             42 CALL                     1
             50 POP_TOP
             52 JUMP_BACKWARD           15 (to 24)

  2     >>   54 END_FOR
             56 RETURN_CONST             0 (None)

Identifying Performance Bottlenecks

Sometimes you can write code that seems perfectly efficient, but is actually much slower than it needs to be. For example, take a look at this function:

def inefficient_sum(n):
    total = 0
    for i in range(n):
        total += i
    return total

It doesn’t use the Python built-in functions, but there’s nothing obviously wrong with it. And it’s not wrong, but let’s look at the bytecode.

print(dis.dis(inefficient_sum))

  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (0)
              4 STORE_FAST               1 (total)

  3           6 LOAD_GLOBAL              1 (NULL + range)
             16 LOAD_FAST                0 (n)
             18 CALL                     1
             26 GET_ITER
        >>   28 FOR_ITER                 7 (to 46)
             32 STORE_FAST               2 (i)

  4          34 LOAD_FAST                1 (total)
             36 LOAD_FAST                2 (i)
             38 BINARY_OP               13 (+=)
             42 STORE_FAST               1 (total)
             44 JUMP_BACKWARD            9 (to 28)

  3     >>   46 END_FOR

  5          48 LOAD_FAST                1 (total)
             50 RETURN_VALUE
None

Without getting lost in the details, the thing to notice is that there are a fair amount of instructions. Let’s compare this to a more efficient version where we use the Python built-in operations.

def efficient_sum(n):
    return sum(range(n))

print(dis.dis(efficient_sum))

  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + sum)
             12 LOAD_GLOBAL              3 (NULL + range)
             22 LOAD_FAST                0 (n)
             24 CALL                     1
             32 CALL                     1
             40 RETURN_VALUE
None

The first thing you’ll notice is that the bytecode for efficient_sum is much simpler, which is a sign that it’s probably a much more efficient operation. It’s also good to look at the type of operation, as some operations are more costly than others.

%timeit inefficient_sum(10)

116 ns ± 1.22 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

%timeit efficient_sum(10)

89.7 ns ± 0.628 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

%timeit inefficient_sum(100)

1.13 μs ± 7.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit efficient_sum(100)

384 ns ± 2.54 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

That’s a significant difference. And it becomes more significant the more numbers you are summing.

External Functions

You can also use dis on imported functions. Here is an example with requests.get:

import requests

dis.dis(requests.get)

 62           0 RESUME                   0

 73           2 LOAD_GLOBAL              1 (NULL + request)
             12 LOAD_CONST               1 ('get')
             14 LOAD_FAST                0 (url)
             16 BUILD_TUPLE              2
             18 LOAD_CONST               2 ('params')
             20 LOAD_FAST                1 (params)
             22 BUILD_MAP                1
             24 LOAD_FAST                2 (kwargs)
             26 DICT_MERGE               1
             28 CALL_FUNCTION_EX         1
             30 RETURN_VALUE

Limitations

Unfortunately, dis cannot do everything. In particular, it can only work for functions that are implemented in Python. Many libraries, such as Numpy and Pytorch, have functions written in C or C++. This means that dis will throw an error if you try to use it on them.

import math
import numpy as np

try:
    dis.dis(math.sqrt)
except TypeError as e:
    print(f"Error: {e}")

Error: don't know how to disassemble builtin_function_or_method objects

try:
    dis.dis(np.sqrt)
except TypeError as e:
    print(f"Error: {e}")

Error: don't know how to disassemble ufunc objects

In these cases, you can still time the functions. This is how I learned that math.sqrt is around an order of magnitude faster than np.sqrt, which surprised me. I would have thought np.sqrt would be faster.

%timeit [math.sqrt(n) for n in range(100)]

2.59 μs ± 9.01 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit [np.sqrt(n) for n in range(100)]

40.7 μs ± 582 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Geospatial Data Plotting Tutorial

2024-04-03T00:00:00+00:00

This tutorial shows how to plot geospatial data on a map of the US. There are lots of libraries that do all the hard work for you, so the key is just knowing that they exist and how to use them.

Table of Contents

Download Map from the Internet
Use Downloaded Shapefile
ArcGIS
Folium
Contextily

import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd

One of the things you’ll have to do is find the right data for mapping. Fortunately, there are datasets built into geopandas that you can use.

The key to plotting geospatial data is in shapefiles. A shapefile is a geospatial vector data format for geographic information system (GIS) software. It is used for storing the location, shape, and attributes of geographic features, such as roads, lakes, or political boundaries. A shapefile is actually a collection of files that work together. The main file (.shp) stores the geometry of the features, the index file (.shx) contains the index of the geometry, and the dBASE table (.dbf) contains attribute information for each shape. Additional files can also be included to store other types of information.

Download Map from the Internet

# Load a map of the US
us_map = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')).query('continent == "North America" and name == "United States of America"')
us_map

C:\Users\Julius\AppData\Local\Temp\ipykernel_8928\39096665.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  us_map = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')).query('continent == "North America" and name == "United States of America"')

	pop_est	continent	name	iso_a3	gdp_md_est	geometry
4	328239523.0	North America	United States of America	USA	21433226	MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Now we can plot it.

fig, ax = plt.subplots(figsize=(10, 10))
us_map.boundary.plot(ax=ax)
ax.set_title("Map of USA")

plt.show()

There are other datasets available, although, as you can see, they’re deprecated. But, for example, you can also get cities.

# Load the naturalearth cities dataset
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities')) # note this is worldwide

# Plot the world map as background
fig, ax = plt.subplots(figsize=(15, 10))
us_map.plot(ax=ax, color='lightgray')

# Plot cities on top
cities.plot(ax=ax, marker='o', color='red', markersize=5)

# Focus on the US by setting the limits
ax.set_xlim([-130, -65])
ax.set_ylim([25, 50])

plt.show()

C:\Users\Julius\AppData\Local\Temp\ipykernel_8928\4049436674.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_cities' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities')) # note this is worldwide

Use Downloaded Shapefile

Because this is getting (unfortunately) deprecated, you might have to download your shapefiles. To use a downloaded shapefile, simply point geopandas to it.

us_states = gpd.read_file('States_shapefile-shp/States_shapefile.shp')
us_states.head()

	FID	Program	State_Code	State_Name	Flowing_St	FID_1	geometry
0	1	PERMIT TRACKING	AL	ALABAMA	F	919	POLYGON ((-85.07007 31.98070, -85.11515 31.907...
1	2	None	AK	ALASKA	N	920	MULTIPOLYGON (((-161.33379 58.73325, -161.3824...
2	3	AZURITE	AZ	ARIZONA	F	921	POLYGON ((-114.52063 33.02771, -114.55909 33.0...
3	4	PDS	AR	ARKANSAS	F	922	POLYGON ((-94.46169 34.19677, -94.45262 34.508...
4	5	None	CA	CALIFORNIA	N	923	MULTIPOLYGON (((-121.66522 38.16929, -121.7823...

We can plot it the same way.

fig, ax = plt.subplots(figsize=(10, 10))
us_states.boundary.plot(ax=ax)
ax.set_title("Map of US States")

plt.show()

ArcGIS

You can also get data from ArcGIS.

# URL of the shapefile
url = 'https://opendata.arcgis.com/datasets/1b02c87f62d24508970dc1a6df80c98e_0.zip'

# Read the shapefile directly from the URL
states = gpd.read_file(url)

# Plot it
fig, ax = plt.subplots(figsize=(12, 8))
states.plot(ax=ax, edgecolor='black', facecolor='white', linewidth=0.5)
ax.set_title('Map of US States', fontsize=16)
ax.axis('off')
plt.tight_layout()
plt.show()

Folium

You can also use folium and Nominatim with GeoPy.

import folium
from geopy.geocoders import Nominatim

# Create a sample DataFrame with addresses
data = {
    "Address": [
        "1600 Pennsylvania Avenue NW, Washington, DC 20500",
        "One Apple Park Way, Cupertino, CA 95014",
        "1 Tesla Road, Austin, TX 78725",
        "1 Microsoft Way, Redmond, WA 98052",
        "1 Amazon Way, Seattle, WA 98109",
    ]
}
df = pd.DataFrame(data)

# Initialize the geocoder
geolocator = Nominatim(user_agent="my_app")


# Function to geocode addresses and return latitude and longitude
def geocode(address):
    location = geolocator.geocode(address)
    if location:
        return [location.latitude, location.longitude]
    return None


# Apply the geocode function to the 'Address' column
df["Coordinates"] = df["Address"].apply(geocode)

# Create a Folium map centered on the United States
map_center = [37.0902, -95.7129]  # Coordinates for the center of the US
map_zoom = 4
usa_map = folium.Map(location=map_center, zoom_start=map_zoom)

# Iterate over the DataFrame rows and add markers to the map
for _, row in df.iterrows():
    if row["Coordinates"]:
        folium.Marker(location=row["Coordinates"], popup=row["Address"]).add_to(usa_map)

# Display the map
usa_map

Make this Notebook Trusted to load map: File -> Trust Notebook

" style="position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;" allowfullscreen="" webkitallowfullscreen="" mozallowfullscreen="">

Contextily

Another option is to use contextily.

import contextily as ctx

# Ensure the CRS is compatible with web tile services
us_states_crs = us_states.to_crs(epsg=3857)

ax = us_states_crs.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)
plt.show()

YData Profiling Tutorial

2024-04-03T00:00:00+00:00

YData Profiling used to be know as pandas-profiling, but it’s moved to a new name and new home. I talked about in my post on cleaning DNA splice junction data, but since it was kind of buried in the post and the name has changed, I thought I would do a quick tutorial that only covers YData Profiling. There isn’t much to demo here because it does so much of the work for you, but I’ll still go over it.

ydata_profiling is a Python library that generates comprehensive reports from a pandas or Spark DataFrame. These reports include detailed exploratory data analysis, providing insights into missing data, variable distributions, correlations, and much more. It’s a powerful tool for initial data investigation and can save a lot of time in the data understanding phase of a project.

import pandas as pd
import seaborn as sns
from ydata_profiling import ProfileReport  # pip install ydata_profiling if you haven't installed it

Let’s grab the Titanic dataset.

df = sns.load_dataset('titanic')
df

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

# Generate the profile report
profile = ProfileReport(df, title='Titanic Data Report', explorative=False)

To view the report, you can use profile.to_widgets(). That doesn’t display well on the blog, so instead I’ll use profile.to_widgets().

# profile.to_widgets()# doesn't work well on blog, but my recommended use in a notebook
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00



Titanic Data ReportTitanic Data Report
Overview
Variables
Interactions
Missing values
Sample
Duplicate rows
Overview
Overview
Alerts 6
Reproduction
Dataset statistics
Number of variables 15
Number of observations 891
Missing cells 869
Missing cells (%) 6.5%
Duplicate rows 53
Duplicate rows (%) 5.9%
Total size in memory 80.7 KiB
Average record size in memory 92.7 B
Variable types
Categorical 8
Numeric 4
Boolean 3
Alerts
Dataset has 53 (5.9%) duplicate rows Duplicates
age has 177 (19.9%) missing values Missing
deck has 688 (77.2%) missing values Missing
sibsp has 608 (68.2%) zeros Zeros
parch has 678 (76.1%) zeros Zeros
fare has 15 (1.7%) zeros Zeros
Reproduction
Analysis started 2024-04-04 06:12:02.524736
Analysis finished 2024-04-04 06:12:04.835479
Duration 2.31 seconds
Software version ydata-profiling vv4.7.0
Download configuration config.json
Variables
survived
Categorical
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 7.1 KiB
 0  549  
 1  342  
Overview
Categories
Words
Characters
Length
Max length 1
Median length 1
Mean length 1
Min length 1
Characters and Unicode
Total characters 891
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row 0
2nd row 1
3rd row 1
4th row 1
5th row 0
Common Values
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 891  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 891  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 891  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
pclass
Categorical
Distinct 3
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 7.1 KiB
 3  491  
 1  216  
 2  184  
Overview
Categories
Words
Characters
Length
Max length 1
Median length 1
Mean length 1
Min length 1
Characters and Unicode
Total characters 891
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row 3
2nd row 1
3rd row 3
4th row 1
5th row 3
Common Values
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 891  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 891  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 891  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
sex
Categorical
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 7.1 KiB
 male  577  
 female  314  
Overview
Categories
Words
Characters
Length
Max length 6
Median length 4
Mean length 4.704826
Min length 4
Characters and Unicode
Total characters 4192
Distinct characters 5
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row male
2nd row female
3rd row female
4th row female
5th row male
Common Values
Value Count Frequency (%)
 male 577  64.8% 
 female 314  35.2% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 male 577  64.8% 
 female 314  35.2% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 e 1205  28.7% 
 m 891  21.3% 
 a 891  21.3% 
 l 891  21.3% 
 f 314    
 7.5% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 4192  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 e 1205  28.7% 
 m 891  21.3% 
 a 891  21.3% 
 l 891  21.3% 
 f 314    
 7.5% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 4192  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 e 1205  28.7% 
 m 891  21.3% 
 a 891  21.3% 
 l 891  21.3% 
 f 314    
 7.5% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 4192  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 e 1205  28.7% 
 m 891  21.3% 
 a 891  21.3% 
 l 891  21.3% 
 f 314    
 7.5% 
age
Real number (ℝ)
MISSING  
Distinct 88
Distinct (%) 12.3%
Missing 177
Missing (%) 19.9%
Infinite 0
Infinite (%) 0.0%
Mean 29.699118
Minimum 0.42
Maximum 80
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0.42
5-th percentile 4
Q1 20.125
median 28
Q3 38
95-th percentile 56
Maximum 80
Range 79.58
Interquartile range (IQR) 17.875
Descriptive statistics
Standard deviation 14.526497
Coefficient of variation (CV) 0.48912219
Kurtosis 0.17827415
Mean 29.699118
Median Absolute Deviation (MAD) 9
Skewness 0.38910778
Sum 21205.17
Variance 211.01912
Monotonicity Not monotonic
Histogram with fixed size bins (bins=50) 
Value Count Frequency (%)
 24 30    
 3.4% 
 22 27    
 3.0% 
 18 26    
 2.9% 
 28 25    
 2.8% 
 30 25    
 2.8% 
 19 25    
 2.8% 
 21 24    
 2.7% 
 25 23    
 2.6% 
 36 22    
 2.5% 
 29 20    
 2.2% 
 Other values (78) 467  52.4% 
 (Missing) 177    
 19.9% 
Minimum 10 values
Maximum 10 values
Value Count Frequency (%)
 0.42 1    
 0.1% 
 0.67 1    
 0.1% 
 0.75 2    
 0.2% 
 0.83 2    
 0.2% 
 0.92 1    
 0.1% 
 1 7  0.8% 
 2 10  1.1% 
 3 6  0.7% 
 4 10  1.1% 
 5 4    
 0.4% 
Value Count Frequency (%)
 80 1    
 0.1% 
 74 1    
 0.1% 
 71 2  0.2% 
 70.5 1    
 0.1% 
 70 2  0.2% 
 66 1    
 0.1% 
 65 3  0.3% 
 64 2  0.2% 
 63 2  0.2% 
 62 4  0.4% 
sibsp
Real number (ℝ)
ZEROS  
Distinct 7
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.52300786
Minimum 0
Maximum 8
Zeros 608
Zeros (%) 68.2%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 3
Maximum 8
Range 8
Interquartile range (IQR) 1
Descriptive statistics
Standard deviation 1.1027434
Coefficient of variation (CV) 2.1084644
Kurtosis 17.88042
Mean 0.52300786
Median Absolute Deviation (MAD) 0
Skewness 3.6953517
Sum 466
Variance 1.2160431
Monotonicity Not monotonic
Histogram with fixed size bins (bins=7) 
Value Count Frequency (%)
 0 608  68.2% 
 1 209    
 23.5% 
 2 28    
 3.1% 
 4 18    
 2.0% 
 3 16    
 1.8% 
 8 7    
 0.8% 
 5 5    
 0.6% 
Minimum 10 values
Maximum 10 values
Value Count Frequency (%)
 0 608  68.2% 
 1 209    
 23.5% 
 2 28    
 3.1% 
 3 16    
 1.8% 
 4 18    
 2.0% 
 5 5    
 0.6% 
 8 7    
 0.8% 
Value Count Frequency (%)
 8 7    
 0.8% 
 5 5    
 0.6% 
 4 18    
 2.0% 
 3 16    
 1.8% 
 2 28    
 3.1% 
 1 209    
 23.5% 
 0 608  68.2% 
parch
Real number (ℝ)
ZEROS  
Distinct 7
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.38159371
Minimum 0
Maximum 6
Zeros 678
Zeros (%) 76.1%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 2
Maximum 6
Range 6
Interquartile range (IQR) 0
Descriptive statistics
Standard deviation 0.80605722
Coefficient of variation (CV) 2.1123441
Kurtosis 9.7781252
Mean 0.38159371
Median Absolute Deviation (MAD) 0
Skewness 2.749117
Sum 340
Variance 0.64972824
Monotonicity Not monotonic
Histogram with fixed size bins (bins=7) 
Value Count Frequency (%)
 0 678  76.1% 
 1 118    
 13.2% 
 2 80    
 9.0% 
 5 5    
 0.6% 
 3 5    
 0.6% 
 4 4    
 0.4% 
 6 1    
 0.1% 
Minimum 10 values
Maximum 10 values
Value Count Frequency (%)
 0 678  76.1% 
 1 118    
 13.2% 
 2 80    
 9.0% 
 3 5    
 0.6% 
 4 4    
 0.4% 
 5 5    
 0.6% 
 6 1    
 0.1% 
Value Count Frequency (%)
 6 1    
 0.1% 
 5 5    
 0.6% 
 4 4    
 0.4% 
 3 5    
 0.6% 
 2 80    
 9.0% 
 1 118    
 13.2% 
 0 678  76.1% 
fare
Real number (ℝ)
ZEROS  
Distinct 248
Distinct (%) 27.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 32.204208
Minimum 0
Maximum 512.3292
Zeros 15
Zeros (%) 1.7%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 7.225
Q1 7.9104
median 14.4542
Q3 31
95-th percentile 112.07915
Maximum 512.3292
Range 512.3292
Interquartile range (IQR) 23.0896
Descriptive statistics
Standard deviation 49.693429
Coefficient of variation (CV) 1.5430725
Kurtosis 33.398141
Mean 32.204208
Median Absolute Deviation (MAD) 6.9042
Skewness 4.7873165
Sum 28693.949
Variance 2469.4368
Monotonicity Not monotonic
Histogram with fixed size bins (bins=50) 
Value Count Frequency (%)
 8.05 43    
 4.8% 
 13 42    
 4.7% 
 7.8958 38    
 4.3% 
 7.75 34    
 3.8% 
 26 31    
 3.5% 
 10.5 24    
 2.7% 
 7.925 18    
 2.0% 
 7.775 16    
 1.8% 
 7.2292 15    
 1.7% 
 0 15    
 1.7% 
 Other values (238) 615  69.0% 
Minimum 10 values
Maximum 10 values
Value Count Frequency (%)
 0 15  1.7% 
 4.0125 1    
 0.1% 
 5 1    
 0.1% 
 6.2375 1    
 0.1% 
 6.4375 1    
 0.1% 
 6.45 1    
 0.1% 
 6.4958 2    
 0.2% 
 6.75 2    
 0.2% 
 6.8583 1    
 0.1% 
 6.95 1    
 0.1% 
Value Count Frequency (%)
 512.3292 3  0.3% 
 263 4  0.4% 
 262.375 2  0.2% 
 247.5208 2  0.2% 
 227.525 4  0.4% 
 221.7792 1    
 0.1% 
 211.5 1    
 0.1% 
 211.3375 3  0.3% 
 164.8667 2  0.2% 
 153.4625 3  0.3% 
embarked
Categorical
Distinct 3
Distinct (%) 0.3%
Missing 2
Missing (%) 0.2%
Memory size 7.1 KiB
 S  644  
 C  168  
 Q  77  
Overview
Categories
Words
Characters
Length
Max length 1
Median length 1
Mean length 1
Min length 1
Characters and Unicode
Total characters 889
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row S
2nd row C
3rd row S
4th row S
5th row S
Common Values
Value Count Frequency (%)
 S 644  72.3% 
 C 168    
 18.9% 
 Q 77    
 8.6% 
 (Missing) 2    
 0.2% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 s 644  72.4% 
 c 168    
 18.9% 
 q 77    
 8.7% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 S 644  72.4% 
 C 168    
 18.9% 
 Q 77    
 8.7% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 889  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 S 644  72.4% 
 C 168    
 18.9% 
 Q 77    
 8.7% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 889  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 S 644  72.4% 
 C 168    
 18.9% 
 Q 77    
 8.7% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 889  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 S 644  72.4% 
 C 168    
 18.9% 
 Q 77    
 8.7% 
class
Categorical
Distinct 3
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 1.1 KiB
 Third  491  
 First  216  
 Second  184  
Overview
Categories
Words
Characters
Length
Max length 6
Median length 5
Mean length 5.2065095
Min length 5
Characters and Unicode
Total characters 4639
Distinct characters 13
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row Third
2nd row First
3rd row Third
4th row First
5th row Third
Common Values
Value Count Frequency (%)
 Third 491  55.1% 
 First 216  24.2% 
 Second 184    
 20.7% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 third 491  55.1% 
 first 216  24.2% 
 second 184    
 20.7% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 i 707  15.2% 
 r 707  15.2% 
 d 675  14.6% 
 T 491  10.6% 
 h 491  10.6% 
 F 216    
 4.7% 
 s 216    
 4.7% 
 t 216    
 4.7% 
 S 184    
 4.0% 
 e 184    
 4.0% 
 Other values (3) 552  11.9% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 4639  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 i 707  15.2% 
 r 707  15.2% 
 d 675  14.6% 
 T 491  10.6% 
 h 491  10.6% 
 F 216    
 4.7% 
 s 216    
 4.7% 
 t 216    
 4.7% 
 S 184    
 4.0% 
 e 184    
 4.0% 
 Other values (3) 552  11.9% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 4639  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 i 707  15.2% 
 r 707  15.2% 
 d 675  14.6% 
 T 491  10.6% 
 h 491  10.6% 
 F 216    
 4.7% 
 s 216    
 4.7% 
 t 216    
 4.7% 
 S 184    
 4.0% 
 e 184    
 4.0% 
 Other values (3) 552  11.9% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 4639  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 i 707  15.2% 
 r 707  15.2% 
 d 675  14.6% 
 T 491  10.6% 
 h 491  10.6% 
 F 216    
 4.7% 
 s 216    
 4.7% 
 t 216    
 4.7% 
 S 184    
 4.0% 
 e 184    
 4.0% 
 Other values (3) 552  11.9% 
who
Categorical
Distinct 3
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 7.1 KiB
 man  537  
 woman  271  
 child  83  
Overview
Categories
Words
Characters
Length
Max length 5
Median length 3
Mean length 3.7946128
Min length 3
Characters and Unicode
Total characters 3381
Distinct characters 10
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row man
2nd row woman
3rd row woman
4th row woman
5th row man
Common Values
Value Count Frequency (%)
 man 537  60.3% 
 woman 271  30.4% 
 child 83    
 9.3% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 man 537  60.3% 
 woman 271  30.4% 
 child 83    
 9.3% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 m 808  23.9% 
 a 808  23.9% 
 n 808  23.9% 
 w 271    
 8.0% 
 o 271    
 8.0% 
 c 83    
 2.5% 
 h 83    
 2.5% 
 i 83    
 2.5% 
 l 83    
 2.5% 
 d 83    
 2.5% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 3381  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 m 808  23.9% 
 a 808  23.9% 
 n 808  23.9% 
 w 271    
 8.0% 
 o 271    
 8.0% 
 c 83    
 2.5% 
 h 83    
 2.5% 
 i 83    
 2.5% 
 l 83    
 2.5% 
 d 83    
 2.5% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 3381  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 m 808  23.9% 
 a 808  23.9% 
 n 808  23.9% 
 w 271    
 8.0% 
 o 271    
 8.0% 
 c 83    
 2.5% 
 h 83    
 2.5% 
 i 83    
 2.5% 
 l 83    
 2.5% 
 d 83    
 2.5% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 3381  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 m 808  23.9% 
 a 808  23.9% 
 n 808  23.9% 
 w 271    
 8.0% 
 o 271    
 8.0% 
 c 83    
 2.5% 
 h 83    
 2.5% 
 i 83    
 2.5% 
 l 83    
 2.5% 
 d 83    
 2.5% 
adult_male
Boolean
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 1023.0 B
 True  537  
 False  354  
Common Values (Table)
Common Values (Plot)
Value Count Frequency (%)
 True 537  60.3% 
 False 354  39.7% 
deck
Categorical
MISSING  
Distinct 7
Distinct (%) 3.4%
Missing 688
Missing (%) 77.2%
Memory size 1.3 KiB
 C  59  
 B  47  
 D  33  
 E  32  
 A  15  
 Other values (2)  17  
Overview
Categories
Words
Characters
Length
Max length 1
Median length 1
Mean length 1
Min length 1
Characters and Unicode
Total characters 203
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row C
2nd row C
3rd row E
4th row G
5th row C
Common Values
Value Count Frequency (%)
 C 59    
 6.6% 
 B 47    
 5.3% 
 D 33    
 3.7% 
 E 32    
 3.6% 
 A 15    
 1.7% 
 F 13    
 1.5% 
 G 4    
 0.4% 
 (Missing) 688  77.2% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 c 59  29.1% 
 b 47  23.2% 
 d 33  16.3% 
 e 32  15.8% 
 a 15    
 7.4% 
 f 13    
 6.4% 
 g 4    
 2.0% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 C 59  29.1% 
 B 47  23.2% 
 D 33  16.3% 
 E 32  15.8% 
 A 15    
 7.4% 
 F 13    
 6.4% 
 G 4    
 2.0% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 203  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 C 59  29.1% 
 B 47  23.2% 
 D 33  16.3% 
 E 32  15.8% 
 A 15    
 7.4% 
 F 13    
 6.4% 
 G 4    
 2.0% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 203  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 C 59  29.1% 
 B 47  23.2% 
 D 33  16.3% 
 E 32  15.8% 
 A 15    
 7.4% 
 F 13    
 6.4% 
 G 4    
 2.0% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 203  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 C 59  29.1% 
 B 47  23.2% 
 D 33  16.3% 
 E 32  15.8% 
 A 15    
 7.4% 
 F 13    
 6.4% 
 G 4    
 2.0% 
embark_town
Categorical
Distinct 3
Distinct (%) 0.3%
Missing 2
Missing (%) 0.2%
Memory size 7.1 KiB
 Southampton  644  
 Cherbourg  168  
 Queenstown  77  
Overview
Categories
Words
Characters
Length
Max length 11
Median length 11
Mean length 10.535433
Min length 9
Characters and Unicode
Total characters 9366
Distinct characters 17
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
 The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. 
Unique
Unique 0 ?
Unique (%) 0.0%
Sample
1st row Southampton
2nd row Cherbourg
3rd row Southampton
4th row Southampton
5th row Southampton
Common Values
Value Count Frequency (%)
 Southampton 644  72.3% 
 Cherbourg 168    
 18.9% 
 Queenstown 77    
 8.6% 
 (Missing) 2    
 0.2% 
Length
 Histogram of lengths of the category 
Common Values (Plot)
Value Count Frequency (%)
 southampton 644  72.4% 
 cherbourg 168    
 18.9% 
 queenstown 77    
 8.7% 
Characters
Categories
Scripts
Blocks
Most occurring characters
Value Count Frequency (%)
 o 1533  16.4% 
 t 1365  14.6% 
 u 889  9.5% 
 h 812  8.7% 
 n 798  8.5% 
 p 644  6.9% 
 S 644  6.9% 
 m 644  6.9% 
 a 644  6.9% 
 r 336    
 3.6% 
 Other values (7) 1057  11.3% 
Most occurring categories
Value Count Frequency (%)
 (unknown) 9366  100.0% 
Most frequent character per category
(unknown)
Value Count Frequency (%)
 o 1533  16.4% 
 t 1365  14.6% 
 u 889  9.5% 
 h 812  8.7% 
 n 798  8.5% 
 p 644  6.9% 
 S 644  6.9% 
 m 644  6.9% 
 a 644  6.9% 
 r 336    
 3.6% 
 Other values (7) 1057  11.3% 
Most occurring scripts
Value Count Frequency (%)
 (unknown) 9366  100.0% 
Most frequent character per script
(unknown)
Value Count Frequency (%)
 o 1533  16.4% 
 t 1365  14.6% 
 u 889  9.5% 
 h 812  8.7% 
 n 798  8.5% 
 p 644  6.9% 
 S 644  6.9% 
 m 644  6.9% 
 a 644  6.9% 
 r 336    
 3.6% 
 Other values (7) 1057  11.3% 
Most occurring blocks
Value Count Frequency (%)
 (unknown) 9366  100.0% 
Most frequent character per block
(unknown)
Value Count Frequency (%)
 o 1533  16.4% 
 t 1365  14.6% 
 u 889  9.5% 
 h 812  8.7% 
 n 798  8.5% 
 p 644  6.9% 
 S 644  6.9% 
 m 644  6.9% 
 a 644  6.9% 
 r 336    
 3.6% 
 Other values (7) 1057  11.3% 
alive
Boolean
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 1023.0 B
 False  549  
 True  342  
Common Values (Table)
Common Values (Plot)
Value Count Frequency (%)
 False 549  61.6% 
 True 342  38.4% 
alone
Boolean
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 1023.0 B
 True  537  
 False  354  
Common Values (Table)
Common Values (Plot)
Value Count Frequency (%)
 True 537  60.3% 
 False 354  39.7% 
Interactions
age
sibsp
parch
fare
fare
age
sibsp
parch
fare
age
sibsp
parch
fare
age
sibsp
parch
fare
age
sibsp
parch
Missing values
Count
Matrix
 A simple visualization of nullity by column. 
 Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion. 
Sample
First rows
Last rows
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no True
6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no True
7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no False
8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes False
9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes False
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
881 0 3 male 33.0 0 0 7.8958 S Third man True NaN Southampton no True
882 0 3 female 22.0 0 0 10.5167 S Third woman False NaN Southampton no True
883 0 2 male 28.0 0 0 10.5000 S Second man True NaN Southampton no True
884 0 3 male 25.0 0 0 7.0500 S Third man True NaN Southampton no True
885 0 3 female 39.0 0 5 29.1250 Q Third woman False NaN Queenstown no False
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True
Duplicate rows
Most frequently occurring
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone # duplicates
35 0 3 male NaN 0 0 7.8958 S Third man True NaN Southampton no True 13
36 0 3 male NaN 0 0 8.0500 S Third man True NaN Southampton no True 12
33 0 3 male NaN 0 0 7.7500 Q Third man True NaN Queenstown no True 8
46 1 3 female NaN 0 0 7.7500 Q Third woman False NaN Queenstown yes True 7
9 0 2 male NaN 0 0 0.0000 S Second man True NaN Southampton no True 6
30 0 3 male NaN 0 0 7.2250 C Third man True NaN Cherbourg no True 5
31 0 3 male NaN 0 0 7.2292 C Third man True NaN Cherbourg no True 5
40 0 3 male NaN 8 2 69.5500 S Third man True NaN Southampton no False 4
2 0 2 male 23.0 0 0 13.0000 S Second man True NaN Southampton no True 3
3 0 2 male 25.0 0 0 13.0000 S Second man True NaN Southampton no True 3
Report generated by YData.
" frameborder="0" allowfullscreen="">

The generated report includes sections on:


  Overview: Summary statistics, dataset size, and variable types.
  Variables: Each variable’s (column’s) distributions, missing values, and unique counts.
  Interactions: Visualizations to explore potential relationships between variables.
  Correlations: Statistical measures of how variables relate to each other.
  Missing Values: Detailed analysis of missing data in the dataset.
  Sample: A preview of the dataset rows.


You can even customize it. I usually leave all this defaults, but this gives you an example.

profile = ProfileReport(df, 
                        title='Titanic Data Report', 
                        explorative=True,
                        dark_mode=True,  # Enable dark mode for the report
                        correlations={
                            "pearson": {"calculate": False},  # Disable Pearson correlation
                            "spearman": {"calculate": True},  # Enable Spearman correlation
                            "kendall": {"calculate": True}    # Enable Kendall correlation
                        },
                        duplicates={"calculate": False},  # Disable duplicate row detection
                        interactions={"continuous": True},  # Enable interactions for continuous variables
                        missing_diagrams={
                            "bar": True,  # Show bar chart for missing values
                            "matrix": True,  # Show matrix of missing values
                            "heatmap": True,  # Show heatmap of missing value correlations
                            "dendrogram": True  # Show dendrogram of missing value correlations
                        },
                        samples={"head": 10, "tail": 10},  # Show first and last 10 rows of the dataset
                        sensitive=True,  # Treat all variables as sensitive, minimizing detailed output
                        sort="ascending",  # Sort variables in ascending order
                        pool_size=2,  # Number of processes to use for parallel processing
                        variables={
                            "descriptions": {
                                "Age": "Age of the passenger",
                                "Sex": "Gender of the passenger",
                                # Add custom descriptions for variables
                            }
                        },
                        minimal=True,  # Generate a minimal report for faster rendering
                        progress_bar=True,  # Display a progress bar during report generation
                        infer_dtypes=False,  # Disable automatic datatype inference
                        html={"style": {"full_width": True, "theme": "flatly"}}  # Apply full width and 'flatly' theme for HTML output
                        )



profile.to_notebook_iframe()


Summarize dataset:   0%|          | 0/5 [00:00


Titanic Data ReportTitanic Data Report
Overview
Variables
Interactions
Missing values
Sample
Overview
Overview
Variables
Alerts 6
Reproduction
Dataset statistics
Number of variables 15
Number of observations 891
Missing cells 869
Missing cells (%) 6.5%
Total size in memory 278.9 KiB
Average record size in memory 320.5 B
Variable types
Boolean 2
Numeric 6
Text 5
Categorical 2
Variable descriptions
Age Age of the passenger
Sex Gender of the passenger
Alerts
age has 177 (19.9%) missing values Missing
deck has 688 (77.2%) missing values Missing
fare has 15 (1.7%) zeros Zeros
parch has 678 (76.1%) zeros Zeros
sibsp has 608 (68.2%) zeros Zeros
survived has 549 (61.6%) zeros Zeros
Reproduction
Analysis started 2024-04-04 06:12:08.864999
Analysis finished 2024-04-04 06:12:13.099001
Duration 4.23 seconds
Software version ydata-profiling vv4.7.0
Download configuration config.json
Variables
adult_male
Boolean
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 1023.0 B
 True  537  
 False  354  
Common Values (Table)
Common Values (Plot)
Value Count Frequency (%)
 True 537  60.3% 
 False 354  39.7% 
age
Real number (ℝ)
MISSING  
Distinct 88
Distinct (%) 12.3%
Missing 177
Missing (%) 19.9%
Infinite 0
Infinite (%) 0.0%
Mean 29.69911765
Minimum 0.42
Maximum 80
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0.42
5-th percentile 4
Q1 20.125
median 28
Q3 38
95-th percentile 56
Maximum 80
Range 79.58
Interquartile range (IQR) 17.875
Descriptive statistics
Standard deviation 14.52649733
Coefficient of variation (CV) 0.4891221855
Kurtosis 0.1782741536
Mean 29.69911765
Median Absolute Deviation (MAD) 9
Skewness 0.3891077823
Sum 21205.17
Variance 211.0191247
Monotonicity Not monotonic
Histogram with fixed size bins (bins=50) 
Value Count Frequency (%)
 24 30    
 3.4% 
 22 27    
 3.0% 
 18 26    
 2.9% 
 28 25    
 2.8% 
 30 25    
 2.8% 
 19 25    
 2.8% 
 21 24    
 2.7% 
 25 23    
 2.6% 
 36 22    
 2.5% 
 29 20    
 2.2% 
 Other values (78) 467  52.4% 
 (Missing) 177    
 19.9% 
Minimum 5 values
Maximum 5 values
Value Count Frequency (%)
 0.42 1  0.1% 
 0.67 1  0.1% 
 0.75 2  0.2% 
 0.83 2  0.2% 
 0.92 1  0.1% 
Value Count Frequency (%)
 80 1  0.1% 
 74 1  0.1% 
 71 2  0.2% 
 70.5 1  0.1% 
 70 2  0.2% 
alive
Text
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 44.8 KiB
 549  
 342  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
549  61.6% 
342  38.4% 
Common Values (Plot)
alone
Boolean
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 1023.0 B
 True  537  
 False  354  
Common Values (Table)
Common Values (Plot)
Value Count Frequency (%)
 True 537  60.3% 
 False 354  39.7% 
class
Categorical
Distinct 3
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 1.3 KiB
 491  
 216  
 184  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
491  55.1% 
216  24.2% 
184    
 20.7% 
Common Values (Plot)
deck
Categorical
MISSING  
Distinct 7
Distinct (%) 3.4%
Missing 688
Missing (%) 77.2%
Memory size 1.6 KiB
 59  
 47  
 33  
 32  
 15  
 17  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
59    
 6.6% 
47    
 5.3% 
33    
 3.7% 
32    
 3.6% 
15    
 1.7% 
13    
 1.5% 
4    
 0.4% 
 (Missing) 688  77.2% 
Common Values (Plot)
embark_town
Text
Distinct 3
Distinct (%) 0.3%
Missing 2
Missing (%) 0.2%
Memory size 51.9 KiB
 644  
 168  
 77  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
644  72.3% 
168    
 18.9% 
77    
 8.6% 
 (Missing) 2    
 0.2% 
Common Values (Plot)
embarked
Text
Distinct 3
Distinct (%) 0.3%
Missing 2
Missing (%) 0.2%
Memory size 43.6 KiB
 644  
 168  
 77  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
644  72.3% 
168    
 18.9% 
77    
 8.6% 
 (Missing) 2    
 0.2% 
Common Values (Plot)
fare
Real number (ℝ)
ZEROS  
Distinct 248
Distinct (%) 27.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 32.20420797
Minimum 0
Maximum 512.3292
Zeros 15
Zeros (%) 1.7%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 7.225
Q1 7.9104
median 14.4542
Q3 31
95-th percentile 112.07915
Maximum 512.3292
Range 512.3292
Interquartile range (IQR) 23.0896
Descriptive statistics
Standard deviation 49.6934286
Coefficient of variation (CV) 1.543072528
Kurtosis 33.39814088
Mean 32.20420797
Median Absolute Deviation (MAD) 6.9042
Skewness 4.78731652
Sum 28693.9493
Variance 2469.436846
Monotonicity Not monotonic
Histogram with fixed size bins (bins=50) 
Value Count Frequency (%)
 8.05 43    
 4.8% 
 13 42    
 4.7% 
 7.8958 38    
 4.3% 
 7.75 34    
 3.8% 
 26 31    
 3.5% 
 10.5 24    
 2.7% 
 7.925 18    
 2.0% 
 7.775 16    
 1.8% 
 7.2292 15    
 1.7% 
 0 15    
 1.7% 
 Other values (238) 615  69.0% 
Minimum 5 values
Maximum 5 values
Value Count Frequency (%)
 0 15  1.7% 
 4.0125 1    
 0.1% 
 5 1    
 0.1% 
 6.2375 1    
 0.1% 
 6.4375 1    
 0.1% 
Value Count Frequency (%)
 512.3292 3  0.3% 
 263 4  0.4% 
 262.375 2  0.2% 
 247.5208 2  0.2% 
 227.525 4  0.4% 
parch
Real number (ℝ)
ZEROS  
Distinct 7
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.3815937149
Minimum 0
Maximum 6
Zeros 678
Zeros (%) 76.1%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 2
Maximum 6
Range 6
Interquartile range (IQR) 0
Descriptive statistics
Standard deviation 0.8060572211
Coefficient of variation (CV) 2.112344071
Kurtosis 9.778125179
Mean 0.3815937149
Median Absolute Deviation (MAD) 0
Skewness 2.749117047
Sum 340
Variance 0.6497282437
Monotonicity Not monotonic
Histogram with fixed size bins (bins=7) 
Value Count Frequency (%)
 0 678  76.1% 
 1 118    
 13.2% 
 2 80    
 9.0% 
 5 5    
 0.6% 
 3 5    
 0.6% 
 4 4    
 0.4% 
 6 1    
 0.1% 
Minimum 5 values
Maximum 5 values
Value Count Frequency (%)
 0 678  76.1% 
 1 118    
 13.2% 
 2 80    
 9.0% 
 3 5    
 0.6% 
 4 4    
 0.4% 
Value Count Frequency (%)
 6 1    
 0.1% 
 5 5    
 0.6% 
 4 4    
 0.4% 
 3 5    
 0.6% 
 2 80  9.0% 
pclass
Real number (ℝ)
Distinct 3
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 2.308641975
Minimum 1
Maximum 3
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 1
5-th percentile 1
Q1 2
median 3
Q3 3
95-th percentile 3
Maximum 3
Range 2
Interquartile range (IQR) 1
Descriptive statistics
Standard deviation 0.836071241
Coefficient of variation (CV) 0.3621485054
Kurtosis -1.280014972
Mean 2.308641975
Median Absolute Deviation (MAD) 0
Skewness -0.6305479069
Sum 2057
Variance 0.69901512
Monotonicity Not monotonic
Histogram with fixed size bins (bins=3) 
Value Count Frequency (%)
 3 491  55.1% 
 1 216  24.2% 
 2 184    
 20.7% 
Minimum 5 values
Maximum 5 values
Value Count Frequency (%)
 1 216  24.2% 
 2 184    
 20.7% 
 3 491  55.1% 
Value Count Frequency (%)
 3 491  55.1% 
 2 184    
 20.7% 
 1 216  24.2% 
sex
Text
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 46.9 KiB
 577  
 314  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
577  64.8% 
314  35.2% 
Common Values (Plot)
sibsp
Real number (ℝ)
ZEROS  
Distinct 7
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.5230078563
Minimum 0
Maximum 8
Zeros 608
Zeros (%) 68.2%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 3
Maximum 8
Range 8
Interquartile range (IQR) 1
Descriptive statistics
Standard deviation 1.102743432
Coefficient of variation (CV) 2.108464374
Kurtosis 17.88041973
Mean 0.5230078563
Median Absolute Deviation (MAD) 0
Skewness 3.695351727
Sum 466
Variance 1.216043077
Monotonicity Not monotonic
Histogram with fixed size bins (bins=7) 
Value Count Frequency (%)
 0 608  68.2% 
 1 209    
 23.5% 
 2 28    
 3.1% 
 4 18    
 2.0% 
 3 16    
 1.8% 
 8 7    
 0.8% 
 5 5    
 0.6% 
Minimum 5 values
Maximum 5 values
Value Count Frequency (%)
 0 608  68.2% 
 1 209    
 23.5% 
 2 28    
 3.1% 
 3 16    
 1.8% 
 4 18    
 2.0% 
Value Count Frequency (%)
 8 7    
 0.8% 
 5 5    
 0.6% 
 4 18  2.0% 
 3 16  1.8% 
 2 28  3.1% 
survived
Real number (ℝ)
ZEROS  
Distinct 2
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.3838383838
Minimum 0
Maximum 1
Zeros 549
Zeros (%) 61.6%
Negative 0
Negative (%) 0.0%
Memory size 7.1 KiB
Statistics
Histogram
Common values
Extreme values
Quantile statistics
Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 1
Maximum 1
Range 1
Interquartile range (IQR) 1
Descriptive statistics
Standard deviation 0.4865924543
Coefficient of variation (CV) 1.267701394
Kurtosis -1.775004671
Mean 0.3838383838
Median Absolute Deviation (MAD) 0
Skewness 0.4785234383
Sum 342
Variance 0.2367722165
Monotonicity Not monotonic
Histogram with fixed size bins (bins=2) 
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Minimum 5 values
Maximum 5 values
Value Count Frequency (%)
 0 549  61.6% 
 1 342  38.4% 
Value Count Frequency (%)
 1 342  38.4% 
 0 549  61.6% 
who
Text
Distinct 3
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 46.1 KiB
 537  
 271  
 83  
Overview
Categories
Unique
Unique 0 ?
Unique (%) 0.0%
Common Values
Value Count Frequency (%)
537  60.3% 
271  30.4% 
83    
 9.3% 
Common Values (Plot)
Interactions
age
fare
parch
pclass
sibsp
survived
survived
age
fare
parch
pclass
sibsp
survived
age
fare
parch
pclass
sibsp
survived
age
fare
parch
pclass
sibsp
survived
age
fare
parch
pclass
sibsp
survived
age
fare
parch
pclass
sibsp
survived
age
fare
parch
pclass
sibsp
Missing values
Count
Matrix
 A simple visualization of nullity by column. 
 Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion. 
Sample
First rows
Last rows
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no True
6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no True
7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no False
8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes False
9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes False
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
881 0 3 male 33.0 0 0 7.8958 S Third man True NaN Southampton no True
882 0 3 female 22.0 0 0 10.5167 S Third woman False NaN Southampton no True
883 0 2 male 28.0 0 0 10.5000 S Second man True NaN Southampton no True
884 0 3 male 25.0 0 0 7.0500 S Third man True NaN Southampton no True
885 0 3 female 39.0 0 5 29.1250 Q Third woman False NaN Queenstown no False
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True
Report generated by YData.
" frameborder="0" allowfullscreen="">

Number of variables	15
Number of observations	891
Missing cells	869
Missing cells (%)	6.5%
Duplicate rows	53
Duplicate rows (%)	5.9%
Total size in memory	80.7 KiB
Average record size in memory	92.7 B

Dataset has 53 (5.9%) duplicate rows	Duplicates
`age` has 177 (19.9%) missing values	Missing
`deck` has 688 (77.2%) missing values	Missing
`sibsp` has 608 (68.2%) zeros	Zeros
`parch` has 678 (76.1%) zeros	Zeros
`fare` has 15 (1.7%) zeros	Zeros

Analysis started	2024-04-04 06:12:02.524736
Analysis finished	2024-04-04 06:12:04.835479
Duration	2.31 seconds
Software version	ydata-profiling vv4.7.0
Download configuration	config.json

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

Total characters	891
Distinct characters	2
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

Distinct	3
Distinct (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

Total characters	4192
Distinct characters	5
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

1st row	male
2nd row	female
3rd row	female
4th row	female
5th row	male

Value	Count	Frequency (%)
e	1205	28.7%
m	891	21.3%
a	891	21.3%
l	891	21.3%
f	314	7.5%

Distinct	88
Distinct (%)	12.3%
Missing	177
Missing (%)	19.9%
Infinite	0
Infinite (%)	0.0%
Mean	29.699118

Minimum	0.42
Maximum	80
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	7.1 KiB

Minimum	0.42
5-th percentile	4
Q1	20.125
median	28
Q3	38
95-th percentile	56
Maximum	80
Range	79.58
Interquartile range (IQR)	17.875

Standard deviation	14.526497
Coefficient of variation (CV)	0.48912219
Kurtosis	0.17827415
Mean	29.699118
Median Absolute Deviation (MAD)	9
Skewness	0.38910778
Sum	21205.17
Variance	211.01912
Monotonicity	Not monotonic

Distinct	7
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	0.52300786

Standard deviation	1.1027434
Coefficient of variation (CV)	2.1084644
Kurtosis	17.88042
Mean	0.52300786
Median Absolute Deviation (MAD)	0
Skewness	3.6953517
Sum	466
Variance	1.2160431
Monotonicity	Not monotonic

Standard deviation	0.80605722
Coefficient of variation (CV)	2.1123441
Kurtosis	9.7781252
Mean	0.38159371
Median Absolute Deviation (MAD)	0
Skewness	2.749117
Sum	340
Variance	0.64972824
Monotonicity	Not monotonic

Distinct	248
Distinct (%)	27.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	32.204208

Minimum	0
Maximum	512.3292
Zeros	15
Zeros (%)	1.7%
Negative	0
Negative (%)	0.0%
Memory size	7.1 KiB

Minimum	0
5-th percentile	7.225
Q1	7.9104
median	14.4542
Q3	31
95-th percentile	112.07915
Maximum	512.3292
Range	512.3292
Interquartile range (IQR)	23.0896

Standard deviation	49.693429
Coefficient of variation (CV)	1.5430725
Kurtosis	33.398141
Mean	32.204208
Median Absolute Deviation (MAD)	6.9042
Skewness	4.7873165
Sum	28693.949
Variance	2469.4368
Monotonicity	Not monotonic

Julius’ Data Science Blog

A Guide to the Python Disassembler Module

Python Bytecode

Bytecode vs. Source Code vs. Machine Code

Getting Started with the dis Module

Understanding the Disassembly Output

Working with classes

Advanced Usage of the dis Module

Exploring the Bytecode Object with dis.Bytecode

Control Structures in Bytecode

Identifying Performance Bottlenecks

External Functions

Limitations

Geospatial Data Plotting Tutorial

Download Map from the Internet

Use Downloaded Shapefile

ArcGIS

Folium

Contextily

YData Profiling Tutorial

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

(unknown)

Most occurring scripts

Most frequent character per script

(unknown)

Most occurring blocks

Most frequent character per block

(unknown)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

(unknown)

Most occurring scripts

Most frequent character per script

(unknown)

Most occurring blocks

Most frequent character per block

(unknown)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

(unknown)

Most occurring scripts

Most frequent character per script

(unknown)

Most occurring blocks

Most frequent character per block

(unknown)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

(unknown)

Most occurring scripts

Most frequent character per script

(unknown)

Most occurring blocks

Most frequent character per block

(unknown)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Advanced Usage of the `dis` Module

Exploring the Bytecode Object with `dis.Bytecode`

Total characters	4639
Distinct characters	13
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

Total characters	3381
Distinct characters	10
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

Distinct	7
Distinct (%)	3.4%
Missing	688
Missing (%)	77.2%
Memory size	1.3 KiB

Total characters	9366
Distinct characters	17
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
881	0	3	male	33.0	0	0	7.8958	S	Third	man	True	NaN	Southampton	no	True
882	0	3	female	22.0	0	0	10.5167	S	Third	woman	False	NaN	Southampton	no	True
883	0	2	male	28.0	0	0	10.5000	S	Second	man	True	NaN	Southampton	no	True
884	0	3	male	25.0	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	0	3	female	39.0	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

Analysis started	2024-04-04 06:12:08.864999
Analysis finished	2024-04-04 06:12:13.099001
Duration	4.23 seconds
Software version	ydata-profiling vv4.7.0
Download configuration	config.json

Standard deviation	14.52649733
Coefficient of variation (CV)	0.4891221855
Kurtosis	0.1782741536
Mean	29.69911765
Median Absolute Deviation (MAD)	9
Skewness	0.3891077823
Sum	21205.17
Variance	211.0191247
Monotonicity	Not monotonic

Standard deviation	49.6934286
Coefficient of variation (CV)	1.543072528
Kurtosis	33.39814088
Mean	32.20420797
Median Absolute Deviation (MAD)	6.9042
Skewness	4.78731652
Sum	28693.9493
Variance	2469.436846
Monotonicity	Not monotonic

Standard deviation	0.8060572211
Coefficient of variation (CV)	2.112344071
Kurtosis	9.778125179
Mean	0.3815937149
Median Absolute Deviation (MAD)	0
Skewness	2.749117047
Sum	340
Variance	0.6497282437
Monotonicity	Not monotonic

Standard deviation	0.836071241
Coefficient of variation (CV)	0.3621485054
Kurtosis	-1.280014972
Mean	2.308641975
Median Absolute Deviation (MAD)	0
Skewness	-0.6305479069
Sum	2057
Variance	0.69901512
Monotonicity	Not monotonic