This notebook contains my cheat sheet for working with PostgreSQL databases.
Table of Contents
- Getting Started Once the Database Is Set Up
- Administering the Database
- Permissions
- Entering into database
- Some samples
- Using pandas
- Useful commands
- Reminders
- Jsonb
Getting Started Once the Database Is Set Up
You can do this either in a jupyter notebook or in a shell.
In a shell, enter psql
Note that the user names are case sensitive. So if you set up an account with your username Julius
it will be julius
. So you have to log in as follows:
psql -d postgres -U julius
You can also log in to the admin account like so:
psql -d postgres -U postgres
Creating a Table
Let’s create a table with PostGIS geometries in it.
CREATE TABLE IF NOT EXISTS geom_table (id serial primary key, polygon_geom geometry(Polygon, 4326), geometry_geom geometry(Geometry, 4326));
Administering the Database
One way to administer your database is with pdAdmin. You’ll probably find that for most things you prefer the command line, but for some things it’s nice to have a UI.
You can see the users here: julius
and postgres
. postgres
is the admin account.
Permissions
Here’s how to update permissions.
GRANT UPDATE, DELETE, REFERENCES, INSERT, TRUNCATE, SELECT, TRIGGER ON TABLE public.geom_table TO julius;
Again, you can do this in the pgAdmin.
You’ll also have to give permission to sequences: GRANT UPDATE, USAGE, SELECT ON SEQUENCE public.geom_table_id_seq TO julius;
Looking Around
\dt
Query a table
select * from geom_table limit 10
Entering into database
Now let’s add some stuff to our table
insert into geom_table (polygon_geom, geometry_geom) values ('0103000020E61000000100000005000000EF30A5DE4B855EC0C3308F769AB347406ED22F8149855EC0DD60A6D99AB347404AB842264A855EC035C13C23A2B34740CD16B8834C855EC01B9125C0A1B34740EF30A5DE4B855EC0C3308F769AB34740', '0103000020E61000000100000005000000EF30A5DE4B855EC0C3308F769AB347406ED22F8149855EC0DD60A6D99AB347404AB842264A855EC035C13C23A2B34740CD16B8834C855EC01B9125C0A1B34740EF30A5DE4B855EC0C3308F769AB34740')
'0103000020E61000000100000005000000EF30A5DE4B855EC0C3308F769AB347406ED22F8149855EC0DD60A6D99AB347404AB842264A855EC035C13C23A2B34740CD16B8834C855EC01B9125C0A1B34740EF30A5DE4B855EC0C3308F769AB34740'
I don’t like doing a lot from the command line tools. It doesn’t give you enough feedback if you make a mistake. For example, if you try to insert into a table that doesn’t exist (perhaps you had a typo in the name) it doesn’t give you an error.
INSERT INTO geom_table (polygon_geom, geometry_geom) VALUES (GeomFromText('POLYGON ((4 5, 5 5, 5 4, 4 4, 4 5))'), GeomFromText('POLYGON ((4 5, 5 5, 5 4, 4 4, 4 5))'))
Note that you can also get weird errors. For example, when I run this:
INSERT INTO Customers(name, email, entry_num, age) VALUES ("John Smith", "js@gmail.com", 100, 25);
The error is weird
The solution is to change your single quotes to double quotes.
Some samples
CREATE TABLE Customers (
name text,
email text,
entry_num int,
age int
);
INSERT INTO Customers(name, email, entry_num, age) VALUES ("John Smith", "js@gmail.com", 100, 25);
INSERT INTO Customers(name, email, entry_num, age) VALUES ("Karen Smith", "ks@gmail.com", 101, 42);
INSERT INTO Customers(name, email, entry_num, age) VALUES ("John Smith", "js@gmail.com", 102, 25);
select entry_num, count(*) from customers
group by entry_num
having count(*)>1
Return all the names that are duplicated and the number of entries
select name, email, count(*)
from Customers
group by name, email
having count(*) > 1
Another way
with subquery as (
select name, email, entry_num, row_number() over(partition by email) as rk
from Customers
)
select name, email, entry_num
from subquery
where rk = 1
Select all rows that are part of a duplicate
select t.* from Customers as t
inner join
( select name, email, age, count(*) from Customers
group by email, name, age
having count(*) >= 2) as tt
on t.email=tt.email and t.name=tt.name and t.age=tt.age
Get the first entry of every person
select name, email, age, min(entry_num) as entry_num from Customers group by email, name, age
Using pandas
You can load results into a DataFrame like so:
with db_session(engine=my_engine) as sess:
records = sess.execute(query_string)
df = pd.DataFrame(records)
But you can also do it directly using read_sql
:
with db_session(engine=my_engine) as sess:
df = pd.read_sql(query_string, sess.bind)
Useful commands
update my_table set deleted=now() where ...
Reminders
If you’re coming from a Python world, remember, single quotes and double quotes are not interchangeable!
Jsonb
The ->>
operator gets the JSONB array element as a text, while the ->
operator gets it as JSONB. If you want to process the element as JSONB, use ->
; if you need it as text for comparison or other operations, use ->>
.