There’s a common warning in pandas
about a SettingWithCopyWarning
. While the error message covers some of the possible reasons for the error, it doesn’t cover them all. In this post, I’ll show another source of the error and how to fix it.
import os
import pandas as pd
os.getenv('DATA')
'I:\\Data'
shakespeare_path = os.path.join(os.getenv('DATA'), 'shakespeare.csv')
df = pd.read_csv(shakespeare_path)
df
Name | Year | Category | |
---|---|---|---|
0 | Titus Andronicus | 1592 | Tragedy |
1 | The Comedy of Errors | 1594 | Comedy |
2 | Richard II | 1595 | History |
3 | Romeo and Juliet | 1595 | Tragedy |
4 | A Midsummer Night’s Dream | 1595 | Comedy |
5 | King John | 1596 | History |
6 | Julius Caesar | 1599 | Tragedy |
7 | Othello | 1604 | Tragedy |
8 | Macbeth | 1606 | Tragedy |
Let’s say you want to make a subset of the data by copying a couple columns.
sdf = df[['Name', 'Year']]
sdf
Name | Year | |
---|---|---|
0 | Titus Andronicus | 1592 |
1 | The Comedy of Errors | 1594 |
2 | Richard II | 1595 |
3 | Romeo and Juliet | 1595 |
4 | A Midsummer Night’s Dream | 1595 |
5 | King John | 1596 |
6 | Julius Caesar | 1599 |
7 | Othello | 1604 |
8 | Macbeth | 1606 |
Then you want to continue cleaning it up.
sdf.loc[:, 'Year'] = pd.to_datetime(sdf['Year'])
C:\Users\Julius\anaconda3\lib\site-packages\pandas\core\indexing.py:1773: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
The error message doesn’t make any sense. You’re already doing what they suggest trying. What you need to do to avoid this is to make a complete copy of the DataFrame using .copy()
.
sdf2 = df[['Name', 'Year']].copy()
sdf2.loc[:, 'Year'] = pd.to_datetime(sdf2['Year'])
No more error message!