Bugs are the bane of programmers' existence. But like death and taxes... well, death, anyway*; you can't get away from them entirely. This is especially true of large application endeavors including those with multiple schemata, which is what I'll be discussing here.
So, our application is using multiple tenant schemata for our clients and the structure must match. We encountered a situation that possibly could have included checking table structures between our template schema and a customer schema. This led to the development of this query.
The query utilizes the information_schema
which should be present in most RDBMS engines, so this should be fairly portable.
To use this query, you'll need to set two to three parameters (depending on the number of tables you wish to compare).
Let's now examine the query:
with check_params as (
select 'my-schema-to-check' as check_schema, -- put schema to check here!
'my-known-good-schema' as main_schema, -- put the known good schema here
null::text as check_table -- put table name to check here (or null for all)!
),
display_table as (
select coalesce(mn.table_name, ck.table_name) as table_name,
case when mn.table_schema is null
then ck.table_name || ' <null>'
when ck.table_schema is null
then 'null ' || ' <' || mn.table_name || '>'
else mn.table_name
end::text as disp_table_name
from (
select table_schema,
table_name
from information_schema.tables
cross
join check_params cp
where table_schema = cp.main_schema
) as mn
full
join (
select table_schema,
table_name
from information_schema.tables
cross
join check_params cp
where table_schema = cp.check_schema
) as ck
on ck.table_name = mn.table_name
),
full_table_compare as (
select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name",
(select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name",
case when c.column_name is distinct from t.column_name
then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>'
else t.column_name
end::text as "column_name",
case when c.udt_name is distinct from t.udt_name
then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>'
else t.udt_name
end::text as column_type,
case when c.character_maximum_length is distinct from t.character_maximum_length
then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>'
else t.character_maximum_length
end::text as text_length,
case when c.numeric_precision is distinct from t.numeric_precision
then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>'
else t.numeric_precision
end::text as precision,
case when c.numeric_scale is distinct from t.numeric_scale
then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>'
else t.numeric_scale
end::text as scale,
case when c.is_nullable is distinct from t.is_nullable
then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>'
else t.is_nullable
end::text as is_nullable,
case when c.column_default is distinct from t.column_default
then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>'
else t.column_default
end::text as col_default
from (
-- Get info from the "good" schema
select cl.table_schema,
cl.table_name,
cl.column_name,
cl.udt_name,
coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default,
cp.check_schema,
cp.main_schema
from information_schema.columns cl
cross
join check_params cp
where cl.table_schema = cp.main_schema
and cl.table_name = coalesce(cp.check_table, cl.table_name)
) as t
full
join (
-- Get info from "suspect" schema
select cl.table_schema,
cl.table_name,
cl.column_name,
cl.udt_name,
coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
regexp_replace(cl.column_default, cp.check_schema || '\.', '', 'g') as column_default,
cp.check_schema,
cp.main_schema
from information_schema.columns cl
cross
join check_params cp
where cl.table_schema = cp.check_schema
and cl.table_name = coalesce(cp.check_table, table_name)
) as c
on c.table_name = t.table_name
and c.column_name = t.column_name
)
select schema_name,
table_name,
column_name,
column_type,
text_length,
precision,
scale,
is_nullable,
col_default
from full_table_compare
-- only report the differences
where table_name ~ '<'
or column_name ~ '<'
or column_type ~ '<'
or text_length ~ '<'
or precision ~ '<'
or scale ~ '<'
or is_nullable ~ '<'
or col_default ~ '<'
order
by table_name,
column_name;
Wow.
Let's break it down.
check_params
with check_params as (
select 'my-schema-to-check' as check_schema, -- put schema to check here!
'my-known-good-schema' as main_schema, -- put the known good schema here
null::text as check_table -- put table name to check here (or null for all)!
)
This is the CTE that will have our parameters. This was created so that we wouldn't have to attempt to try script variables and to only have to define the values once. Just replace the strings with the requisite schema names to compare all tables in both schemas. Replace the null::text
for the check_table param with a table name to constrain to that one table.
display_table
display_table as (
select coalesce(mn.table_name, ck.table_name) as table_name,
case when mn.table_schema is null
then ck.table_name || ' <null>'
when ck.table_schema is null
then 'null ' || ' <' || mn.table_name || '>'
else mn.table_name
end::text as disp_table_name
from (
select table_schema,
table_name
from information_schema.tables
cross
join check_params cp
where table_schema = cp.main_schema
) as mn
full
join (
select table_schema,
table_name
from information_schema.tables
cross
join check_params cp
where table_schema = cp.check_schema
) as ck
on ck.table_name = mn.table_name
)
This CTE is designed to help with output. Because we are comparing if a table does or does not exist as well as its column existing or not, we have an issue. The table name is replicated on the information_schema.columns
record for each column. So we want to display it correctly regardless of presence/absence of the table itself or for a column mismatch.
Why a full join
? We want to compare both ways.
full_table_compare
full_table_compare as (
select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name",
(select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name",
case when c.column_name is distinct from t.column_name
then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>'
else t.column_name
end::text as "column_name",
case when c.udt_name is distinct from t.udt_name
then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>'
else t.udt_name
end::text as column_type,
case when c.character_maximum_length is distinct from t.character_maximum_length
then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>'
else t.character_maximum_length
end::text as text_length,
case when c.numeric_precision is distinct from t.numeric_precision
then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>'
else t.numeric_precision
end::text as precision,
case when c.numeric_scale is distinct from t.numeric_scale
then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>'
else t.numeric_scale
end::text as scale,
case when c.is_nullable is distinct from t.is_nullable
then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>'
else t.is_nullable
end::text as is_nullable,
case when c.column_default is distinct from t.column_default
then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>'
else t.column_default
end::text as col_default
from (
-- Get info from the "good" schema
select cl.table_schema,
cl.table_name,
cl.column_name,
cl.udt_name,
coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default,
cp.check_schema,
cp.main_schema
from information_schema.columns cl
cross
join check_params cp
where cl.table_schema = cp.main_schema
and cl.table_name = coalesce(cp.check_table, cl.table_name)
) as t
full
join (
-- Get info from "suspect" schema
select cl.table_schema,
cl.table_name,
cl.column_name,
cl.udt_name,
coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
regexp_replace(cl.column_default, cp.check_schema || '\.', '', 'g') as column_default,
cp.check_schema,
cp.main_schema
from information_schema.columns cl
cross
join check_params cp
where cl.table_schema = cp.check_schema
and cl.table_name = coalesce(cp.check_table, table_name)
) as c
on c.table_name = t.table_name
and c.column_name = t.column_name
)
So there's two main parts to this query.
schema information
select cl.table_schema,
cl.table_name,
cl.column_name,
cl.udt_name,
coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default,
cp.check_schema,
cp.main_schema
from information_schema.columns cl
cross
join check_params cp
where cl.table_schema = cp.main_schema
and cl.table_name = coalesce(cp.check_table, cl.table_name)
We need to get specific information from the information_schema regarding the columns for the target table(s) in the main or good or master or template schema (whatever you with to call it). This is by far not exhaustive, but it's a good place to start for basic structure comparisons. We also need to get the same information for the target, suspect, bad, or check schema (again, whatever you with to call it).
This information is:
- table_schema : Namespace of the table
- table_name : Name of the table
- column_name : Name of the column
- udt_name : Name of the (internal) data type of the column
-
character_maximum_length : Len of a
char
orvarchar
column. This will benull
for atext
type column. - numeric_precision : Precision of a numeric/decimal column
- numeric_scale : Scale of a numeric/decimal column
-
is_nullable : YES if it can contain
null
else NO - column_default : The default value for the column
Some operations are done at these queries to make the comparison logic easier such as type changes or string scrubbing.
Full joins are again used to allow for bi-directional comparisons.
comparison logic
select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name",
(select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name",
case when c.column_name is distinct from t.column_name
then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>'
else t.column_name
end::text as "column_name",
case when c.udt_name is distinct from t.udt_name
then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>'
else t.udt_name
end::text as column_type,
case when c.character_maximum_length is distinct from t.character_maximum_length
then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>'
else t.character_maximum_length
end::text as text_length,
case when c.numeric_precision is distinct from t.numeric_precision
then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>'
else t.numeric_precision
end::text as precision,
case when c.numeric_scale is distinct from t.numeric_scale
then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>'
else t.numeric_scale
end::text as scale,
case when c.is_nullable is distinct from t.is_nullable
then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>'
else t.is_nullable
end::text as is_nullable,
case when c.column_default is distinct from t.column_default
then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>'
else t.column_default
end::text as col_default
The rubber meets the road here. This is where all of the comparisons of the records from the two schemata are done. Basically, it checks for differences and reports them in a specified manner. Any difference will be reported as
suspect_schema_value <good_schema_value>
meaning that the "good" values will be surrounded by <
and >
characters.
If a column has a null value it will appear as null
or <null>
depending if the value originated from the "suspect" or "good" schema.
If the values are identical, then there will only be the plaintext value listed as no other indicators are necessary.
The use of IS DISTINCT FROM was crucial to simplify the comparisons. This is an operator that evaluates null
as if it were a value instead of, well... null. So if we consider null
to have a value then x IS DISTINCT FROM y
would be similar to coalesce(x, 'x') != coalesce(y, 'y')
and x IS NOT DISTINCT FROM y
would be equivalent to coalesce(x, 'x') != coalesce(y, 'y')
. Check out the document link for the official explanation.
assembly
select schema_name,
table_name,
column_name,
column_type,
text_length,
precision,
scale,
is_nullable,
col_default
from full_table_compare
-- only report the differences
where table_name ~ '<'
or column_name ~ '<'
or column_type ~ '<'
or text_length ~ '<'
or precision ~ '<'
or scale ~ '<'
or is_nullable ~ '<'
or col_default ~ '<'
order
by table_name,
column_name;
This is a lot of data and, on a good day, it will be mostly if not all in sync. So we're really only interested in the differences. This final query only pulls the records from the output of the full_table_compare
CTE but only those records where any column contains a <
character denoting a difference. We're checking every column except schema_name
as it will always have the format suspect_schema <good_schema>
as an indicator of the origin of reported values.
Demo time!
Let's start by creating two schemata:
postgres=# create schema eep;
CREATE SCHEMA
postgres=# create schema opp;
CREATE SCHEMA
Now we'll create some sample tables to play with.
postgres=# create table eep.ork
postgres-# (
postgres(# id bigserial primary key,
postgres(# label text,
postgres(# created_ts timestamptz,
postgres(# data jsonb
postgres(# );
CREATE TABLE
postgres=# create table eep.oof
postgres-# (
postgres(# uniq_label text primary key
postgres(# );
CREATE TABLE
postgres=# create table opp.ork
postgres-# (
postgres(# id bigserial primary key,
postgres(# label text,
postgres(# created_ts timestamptz,
postgres(# data jsonb
postgres(# );
CREATE TABLE
postgres=# create table opp.ah_ah
postgres-# (
postgres(# id uuid primary key,
postgres(# authentication_header jsonb
postgres(# );
CREATE TABLE
We'll change our query parameters to say the eep
schema is "good" and opp
is "suspect". We'll also set it to only look at the ork
table.
with check_params as (
select 'opp' as check_schema, -- put schema to check here!
'eep' as main_schema, -- put the known good schema here
'ork'::text as check_table -- put table name to check here (or null for all)!
),
Running the query, we see the following output:
schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default
-------------+------------+-------------+-------------+-------------+-----------+-------+-------------+-------------
(0 rows)
素晴らしい! No differences!
Let's expand to all of the tables.
with check_params as (
select 'opp' as check_schema, -- put schema to check here!
'eep' as main_schema, -- put the known good schema here
null::text as check_table -- put table name to check here (or null for all)!
),
Running the query, we now see:
schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default
-------------+--------------+------------------------------+--------------+-------------+-------------+-------------+-------------+-------------
opp <eep> | ah_ah <null> | authentication_header <null> | jsonb <null> | null <null> | null <null> | null <null> | YES <null> |
opp <eep> | ah_ah <null> | id <null> | uuid <null> | null <null> | null <null> | null <null> | YES <null> |
opp <eep> | null <oof> | null <uniq_label> | null <text> | null <null> | null <null> | null <null> | null <NO> |
(3 rows)
So, let's examine the output.
First, there's the schema_name
column. This shows the suspect schema followed by the good schema (surrounded by <
and >
chars).
Next we have the table_name
column. Note there there is an ah_ah
table that is in the opp
schema, but not in the eep
schema. Also, tere is an oof
table that is in the eep
schema that is not in the opp
schema.
The rest of the columns show the differences at each column: type, length, precision, scale, nullable, default.
So we can tell missing tables, but how about actual differences?
Let's change opp.ork
a bit.
-
Change the label from
text
tovarchar(256)
postgres=# alter table opp.ork alter column label set data type varchar(256); ALTER TABLE
-
Add a new column
updated_ts
postgres=# alter table opp.ork add column updated_ts timestamptz; ALTER TABLE
Just for fun, let's also alter eep.ork
.
-
Add an
audited
column
postgres=# alter table eep.ork add column audited boolean not null default false; ALTER TABLE
Now let's constrain the query back to only the ork
table:
with check_params as (
select 'opp' as check_schema, -- put schema to check here!
'eep' as main_schema, -- put the known good schema here
'ork'::text as check_table -- put table name to check here (or null for all)!
),
Now, running the query, we see:
schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default
-------------+------------+-------------------+--------------------+-------------+-------------+-------------+-------------+--------------
opp <eep> | ork | label | varchar <text> | 256 <null> | null | null | YES |
opp <eep> | ork | null <audited> | null <bool> | null <null> | null <null> | null <null> | null <NO> | null <false>
opp <eep> | ork | updated_ts <null> | timestamptz <null> | null <null> | null <null> | null <null> | YES <null> |
(3 rows)
So we've caught the extra column audited
in eep.ork
, the extra column updated_ts
in opp.ork
and the change to the common label
column.
In closing, this could prove a useful tool to compare tables between schemata. This could be the basis for extending comparisons across databases on the same engine as well.
Hopefully this will prove useful and be helpful to database developers.
*I always pay my tax. Really!
Top comments (0)