This notebook contain my cheat sheet for working with databases. It’s is hiding here while it’s in draft.
Table of Contents
- Getting Started Once the Database Is Set Up
- Administering the Database
- Entering into database
- Some samples
- Using pandas
- Useful commands
Getting Started Once the Database Is Set Up
You can do this either in a jupyter notebook or in a shell.
In a shell, enter
Note that the user names are case sensitive. So if you set up an account with your username
Julius it will be
julius. So you have to log in as follows:
psql -d postgres -U julius
You can also log in to the admin account like so:
psql -d postgres -U postgres
Creating a Table
Let’s create a table with PostGIS geometries in it.
CREATE TABLE IF NOT EXISTS geom_table (id serial primary key, polygon_geom geometry(Polygon, 4326), geometry_geom geometry(Geometry, 4326));
Administering the Database
One way to administer your database is with pdAdmin. You’ll probably find that for most things you prefer the command line, but for some things it’s nice to have a UI.
You can see the users here:
postgres is the admin account.
Here’s how to update permissions.
GRANT UPDATE, DELETE, REFERENCES, INSERT, TRUNCATE, SELECT, TRIGGER ON TABLE public.geom_table TO julius;
Again, you can do this in the pgAdmin.
You’ll also have to give permission to sequences:
GRANT UPDATE, USAGE, SELECT ON SEQUENCE public.geom_table_id_seq TO julius;
Query a table
select * from geom_table limit 10
Entering into database
Now let’s add some stuff to our table
insert into geom_table (polygon_geom, geometry_geom) values ('0103000020E61000000100000005000000EF30A5DE4B855EC0C3308F769AB347406ED22F8149855EC0DD60A6D99AB347404AB842264A855EC035C13C23A2B34740CD16B8834C855EC01B9125C0A1B34740EF30A5DE4B855EC0C3308F769AB34740', '0103000020E61000000100000005000000EF30A5DE4B855EC0C3308F769AB347406ED22F8149855EC0DD60A6D99AB347404AB842264A855EC035C13C23A2B34740CD16B8834C855EC01B9125C0A1B34740EF30A5DE4B855EC0C3308F769AB34740')
I don’t like doing a lot from the command line tools. It doesn’t give you enough feedback if you make a mistake. For example, if you try to insert into a table that doesn’t exist (perhaps you had a typo in the name) it doesn’t give you an error.
INSERT INTO geom_table (polygon_geom, geometry_geom) VALUES (GeomFromText('POLYGON ((4 5, 5 5, 5 4, 4 4, 4 5))'), GeomFromText('POLYGON ((4 5, 5 5, 5 4, 4 4, 4 5))'))
Note that you can also get weird errors. For example, when I run this:
INSERT INTO Customers(name, email, entry_num, age) VALUES ("John Smith", "firstname.lastname@example.org", 100, 25);
The error is weird
The solution is to change your single quotes to double quotes.
CREATE TABLE Customers ( name text, email text, entry_num int, age int ); INSERT INTO Customers(name, email, entry_num, age) VALUES ("John Smith", "email@example.com", 100, 25); INSERT INTO Customers(name, email, entry_num, age) VALUES ("Karen Smith", "firstname.lastname@example.org", 101, 42); INSERT INTO Customers(name, email, entry_num, age) VALUES ("John Smith", "email@example.com", 102, 25); select entry_num, count(*) from customers group by entry_num having count(*)>1
Return all the names that are duplicated and the number of entries
select name, email, count(*) from Customers group by name, email having count(*) > 1
with subquery as ( select name, email, entry_num, row_number() over(partition by email) as rk from Customers ) select name, email, entry_num from subquery where rk = 1
Select all rows that are part of a duplicate
select t.* from Customers as t inner join ( select name, email, age, count(*) from Customers group by email, name, age having count(*) >= 2) as tt on t.email=tt.email and t.name=tt.name and t.age=tt.age
Get the first entry of every person
select name, email, age, min(entry_num) as entry_num from Customers group by email, name, age
You can load results into a DataFrame like so:
with db_session(engine=my_engine) as sess: records = sess.execute(query_string) df = pd.DataFrame(records)
But you can also do it directly using
with db_session(engine=my_engine) as sess: df = pd.read_sql(query_string, sess.bind)
update my_table set deleted=now() where ...
If you’re coming from a Python world, remember, single quotes and double quotes are not interchangeable!
->> operator gets the JSONB array element as a text, while the
-> operator gets it as JSONB. If you want to process the element as JSONB, use
->; if you need it as text for comparison or other operations, use