Archive

Archive for the ‘PostgreSQL’ Category

Scripting within psql

November 2, 2009 shoaibmir 1 comment

In one of my previous posts I was told of using PSQL instead of shell scripting to make things much simpler. That turned out to a good tip and with some help from my colleague at work I am now so used to PSQL in these kind of scenarios.

Just today I had to get the dead tuple percentages (using pgstattuple contrib) for tables within all databases on the server and here is how I was to get it done using PSQL and a simple shell script….

#!/bin/bash
PSQL="psql -A -t"
DBLIST=$($PSQL postgres -c "select datname from pg_database
where not datistemplate and datname not in ('postgres')")
for DB in $DBLIST; do
 echo $DB;
 psql $DB  < gettupd.sql > $DB.out
done
exit 0

gettupd.sql

\a
\t
\o /tmp/tupd.sql
select 'SELECT \'' || relname || '\', * from pgstattuple('''||schemaname||
'.'||relname||''');' from pg_stat_user_tables ;
\o
\i /tmp/tupd.sql
\! rm /tmp/tupd.sql
\a
\t

PSQL is just the way to go…  :)


Shoaib Mir
shoaibmir[@]gmail.com

A quick way to change index tablespace

September 29, 2009 shoaibmir 2 comments

On one of the database servers, I had to quickly move all indexes from all the databases on to a different disk in order to free up some space from the main disk. I thought of sharing it so that someone else wanting to do the same thing can get an idea for doing the same….

A quick how to,

First made a query to get all the indexes within a database using the following:

select pg_namespace.nspname||'.'||pg_class.relname  from pg_class, pg_namespace
where pg_class.relnamespace  = pg_namespace.oid and pg_class.relkind = 'i' and pg_class.relname
not like 'pg_%'

and then using a simple script, plugging in the the above mentioned query to do an ALTER INDEX for each index by setting the new table space:

#!/bin/bash
DNAME_LIST=$(psql postgres -A -t -c "select datname from pg_database where
not datistemplate and datname != 'postgres'")
for DATABASE in $DNAME_LIST; do
INAME_LIST=$($PSQL $DATABASE -A -t -c "select pg_namespace.nspname||'.'||pg_class.relname
from pg_class, pg_namespace where
pg_class.relnamespace  = pg_namespace.oid and pg_class.relkind = 'i' and pg_class.relname
not like 'pg_%'")
    ( for INDEX in $INAME_LIST; do echo "ALTER INDEX $INDEX SET TABLESPACE
new_disk_index;"; done ) | psql $DATABASE
done


Shoaib Mir
shoaibmir[@]gmail.com

Row size and Btree index

September 23, 2009 shoaibmir 2 comments

At work today saw an error while inserting data into a table:

ERROR: index row size 2962 exceeds btree maximum, 2713

Looking at the HINT in the error, found out that you cant have the row size of more then 1/3th of the PostgreSQL block size (8K by default) for a btree index on the column.

The best is to start using PostgreSQL full text search option for these kind of scenarios as it will be much faster and easier… but in my case that was not an option so had to somehow get around this problem without using text search engine.

After a little bit of research two possible solutions I could find out were (where column was of text data type):

-  Create a hash index on the same column as it will let you store more then 1/3rd of the block size, but a couple of things you need to consider before you go that route…

  • It lets you get the data inserted for more then the 1/3rd of the block size but that again has limitation, you cant go above the PG block size and even closer to that.
  • With Hash indexing you wont be able to use the text pattern matching functions “LIKE” and “ILIKE”, you are left with “=”

-  The other solution is to use the partial index where CREATE INDEX  will look something like this…

        create index .... where length(clm_name) < 500

…this will require changing the queries as well, now you will have to add “where length(clm_name) < 500″ while doing SELECTS.


Shoaib Mir
shoaibmir[@]gmail.com


Finding top read tables in the database

September 13, 2009 shoaibmir 2 comments

A quick and good way to find the most used tables in a database with regards to reads can be:

PostgreSQL provides a system catalog view “pg_stat_all_tables” for helping DBAs with finding such statistics.

The view looks like this:

Columns
-------------
relid
relname
seq_scan
seq_tup_read
idx_scan
idx_tup_fetch
n_tup_ins
n_tup_upd
n_tup_del

This view has one entry for each table in the database. The seq_scan column tells you how many sequential scans have been performed for a given table and seq_tup_read tells you how many rows were used during those sequence scan. The idx_scan and idx_tup_fetch gives the same kind of information for the index scan usage. The n_tup_ins, n_tup_upd and n_tup_del gives you information on how many rows were inserted, update and deleted.

We can use this view in the following way to get the top 10 most read tables…

SELECT relname, idx_tup_fetch + seq_tup_read as TotalReads from pg_stat_all_tables
WHERE idx_tup_fetch + seq_tup_read != 0
order by TotalReads desc
LIMIT 10;
(With thanks to my favorite book "PostgreSQL by Korry Douglas")


Shoaib Mir
shoaibmir[@]gmail.com

Setting up Slony cluster with PerlTools

August 5, 2009 shoaibmir 2 comments

A few days back while I was giving a training on Slony and was showing the students steps to building a Slony cluster using Perltools, they asked me to write down the steps for them as it was very hard to memorize everything. I just did that and thought of sharing it with everyone as it might help.

First of all make sure you have the perltools available for Slony and once you have confirmed that, follow the steps given below:

- Edit the slon_tools.conf file according to your cluster needs, i.e. change the nodes, sets information and tables to be replicated. This file by default is located in /usr/local/etc/slon_tools.conf and the Perltools will by default look into this directory for the configuration file.

- Now first of all you need to initialize the cluster

slonik_init_cluster | slonik

- Once that step is successful, start the Slon processes on each of the nodes involved

- Create the Sets, in our case we just had one set so the process goes as

slonik_create_set 1 | slonik

(Where 1 is the ID of the first set which we had set in the slon_tools.conf file)

- Subscribe the sets now, in our case we had just two nodes and one set

slonik_subscribe_set 1 2 | slonik

(Where 1 is the ID of the set and 2 is the node to which you want to subscribe)

- Once the subscription is successfull you should have replication all working now, which can be tested by adding a row in one of the replicated tables on master and verifying the change in your slave node.

Perltools really make life easy and takes rid of those extra steps for creating Slonik scripts yourself . Once you have the slon_tools.conf properly configured it just takes a few minutes generating all the required scripts.

Just make sure when you make any changes to your Slony cluster like adding new sets, new tables, merging sets and doing switchovers or failover… keep on updating the slon_tools.conf file regularly as then it will not take much time rebuilding all scripts for the latest cluster in case that is needed.


Shoaib Mir
shoaibmir[@]gmail.com

Categories: PostgreSQL Tags: , ,

DDL changes in mutiple Slony sets

May 8, 2009 shoaibmir 2 comments

I had a client scenario where we were replicating 16 different schemas using 16 sets with Slony and while doing DDL statements it was kind of confusing for the customer to remember which schema has what set number.

In the end I came up with a simple script that used to take input as:

executeDDL.sh schemaname ddl.sql

Here is how the script was…

STR_SCHEMA=$1
STR_FILE=$2
echo "Schema is" $STR_SCHEMAecho "Filename is" $STR_FILEnames=( scehama1
scehama2 scehama3 scehama4  scehama5 scehama6 scehama7 scehama8
scehama9 scehama10 scehama11 scehama12 scehama13 scehama14
scehama15 scehama16 )
num=1
for (( i = 0 ; i < ${#names[@]} ; i++ ))
do
nm=${names[$i]}
j=`expr $num + $i`
if [ $nm = $STR_SCHEMA ]
then
SCHEMA_ID=$j
fi
done
echo  "Set ID for Schema is" $SCHEMA_IDecho "Executing Command" "slonik_execute_script --config=slon_tools.conf $SCHEMA_ID $STR_FILE | slonik"
slonik_execute_script --config=slon_tools.conf $SCHEMA_ID $STR_FILE | slonik



Shoaib Mir
shoaibmir[@]gmail.com

Categories: PostgreSQL Tags: ,

‘public’ schema and Slony

April 8, 2009 shoaibmir 4 comments

While setting up Slony the other day I found this weird problem…

I had like 8 schemas in my database and no schema named as ‘public’  and while trying to setup Slony for that specific database, during the INIT cluster step it started giving problems like…

“No schema available to setup objects”

I had no idea that its just a case of not having ‘public’ schema in there and as soon as I added an empty public schema, everything seemed to work fine and I was able to setup 8 sets for my 8 schemas and replication worked nicely as well.

Update: Just found out the problem came as I have deleted the public schema but the user’s search_path was defaulting to ‘public’ and changing that fixed the problem.


Shoaib Mir
shoaibmir[@]gmail.com

Categories: PostgreSQL Tags: ,

Features Compatibility with Oracle

March 23, 2009 shoaibmir 8 comments

A few days back I was given the task to write a document giving the compatibility features in PostgreSQL when comparing with Oracle as a prospective customer wanted to move out of Oracle. They gave us a list of features for which we had to give some sort of compatibility in PostgreSQL.

Here is how I ended up doing it:

Feature

Comments

Procedural Language

Supported using PLPGSQL which is very simiar to PLSQL in Oracle

Windowing Functions

RANK, OVER, PARTITION BY, WITH supported for Recursive queries

Full Text Search

Proper linguistic, searching and ordering support for natural language documents

Table Paritioning

Range and list paritioning is supported

Tablespaces

Oracle like tablespaces support

Database Links

Supported using “dblinks” contribution module

Localization

Provides vast majority of character sets including multi-byte ones.

Transactions

ACID compliant and uses MVCC

GIS Support

Supported with PostGIS which uses GEOS 3.1

Query Optimization

Cost based query optimization

Large Objects Support

Options available for Oracle like BFILE, BLOB and RAW data types using pre-defined large object methods

XML Support

XML data type supported

Application Drivers

Perl, Java, C/C++, PHP .NET, Python, TCL, Delphi, Ruby, ODBC, OLEDB and Openoffice drivers

Embedded SQL

Using ECPG

Process Monitoring

Using tools like ‘ptop’

Replication and Load Balancing

Using Slony with PgPool

Recovery Mecahnism

With PITR (Point In Time Recovery)

Backup and Restore

Standard backup and restore procedures for database using pg_dump and pg_restore

Access Control

Done by controlling a configuration file where Operating System level authentication can also be used

Gateway Product support

Using DBI-Link it is possible to connect to any datasource from within PostgreSQL

Portability

Supported on most of the Unix based platforms (Solaris, RHEL, Ubuntu, SuSe, FreeBSD and many more) and Windows


Shoaib Mir
shoaibmir[@]gmail.com

Using CASE with SELECT

March 10, 2009 shoaibmir 5 comments

Today I came across an interesting example, another team at our office working with a bug tracking software needed to group the data within one of the tables on the basis of a specific component and I was able help them out with the query for doing so. It can be explained in an easier way using the dummy data…

The table and the data can be represented as:

create table testsample (ticketid int, component text, keywords text);

insert into testsample values (1, ‘JTA’, ‘Runtime’);
insert into testsample values (2, ‘JTA’, ‘MSG’);
insert into testsample values (3, ‘JTA’, ‘MSG’);
insert into testsample values (4, ‘JTA’, ‘Deploy’);
insert into testsample values (5, ‘EJB’, ‘Runtime’);
insert into testsample values (6, ‘EJB’, ‘Deploy’);
insert into testsample values (7, ‘J2EE’, ‘Runtime’);

Now we had to represent the data with groups of JTA, EJB and J2EE and then give the counts of MSG, Runtime and Deploy for each component type… SELECT with CASE came handy in this scenario as I ended up doing for a quick solution:

SELECT
component,

CASE WHEN component = ‘JTA’
THEN (select count(1) from testsample where keywords = ‘Runtime’ and component = ‘JTA’)
WHEN component = ‘EJB’
THEN (select count(1) from testsample where keywords = ‘Runtime’ and component = ‘EJB’)
WHEN component = ‘J2EE’
THEN (select count(1) from testsample where keywords = ‘Runtime’ and component = ‘J2EE’)
ELSE 0
END AS Runtime
,
CASE WHEN component = ‘JTA’
THEN (select count(1) from testsample where keywords = ‘Deploy’ and component = ‘JTA’)
WHEN component = ‘EJB’
THEN (select count(1) from testsample where keywords = ‘Deploy’ and component = ‘EJB’)
WHEN component = ‘J2EE’
THEN (select count(1) from testsample where keywords = ‘Deploy’ and component = ‘J2EE’)
ELSE 0
END AS Deploy
,
CASE WHEN component = ‘JTA’
THEN (select count(1) from testsample where keywords = ‘MSG’ and component = ‘JTA’)
WHEN component = ‘EJB’
THEN (select count(1) from testsample where keywords = ‘MSG’ and component = ‘EJB’)
WHEN component = ‘J2EE’
THEN (select count(1) from testsample where keywords = ‘MSG’ and component = ‘J2EE’)
END AS MSG
FROM testsample
group by component

And the output was…

output

Update:

David Fetter came up with a much better and cleaner solution…

SELECT
component,
SUM(CASE keywords WHEN ‘Runtime’ THEN 1 ELSE 0 END) AS Runtime,
SUM(CASE keywords WHEN ‘Deploy’ THEN 1 ELSE 0 END) AS Deploy,
SUM(CASE keywords WHEN ‘MSG’ THEN 1 ELSE 0 END) AS MSG
FROM
testsample
GROUP BY component;


Shoaib Mir
shoaibmir[@]gmail.com

Categories: PostgreSQL Tags: ,

Adding new tables to a Slony cluster

March 6, 2009 shoaibmir Leave a comment

I have been using Slony for quite a while now and its been just very reliable and consistent for me. The only thing that I find complicated with its usage is doing DDL changes (i.e adding new tables to the cluster, doing changes to tables like adding new columns etc etc..).

The tables which are already there can be easily modified using Slony EXECUTE SCRIPT for ALTER TABLE commands. But the problem comes when you need to add a new table to am existing Slony set. Right now there is no way to add the new tables to the set once the set has been created, you can only delete a table from the set but not add it. But yes there is a way around that…

- Before we start anything make sure Slony cluster is in good shape by querying “select * from $clustername.sl_status” on master node and check for any abnormal lag times. If there is nothing abnormal, then proceed with adding new tables.

- Create the new tables on all the nodes involved in the Slony cluster using PSQL.

- Now create a new set (you can call it a temporary set) in the Slony cluster with the news tables/sequences.

- Once the set is created do the subscriptions for set exactly the same as the other set for which you wanted to add your tables to originally.

- After the subscriptions have been done, merge the temporary set into your original set (merge will only work if the subscriptions for old and new set are same)

Now you have the new tables/sequences added to your original set and the temporary set is no longer in $cluster.sl_set, you can verify this by querying $cluster.sl_tables

select tab_set from _test_cluster.sl_table where tab_relname = ‘<your_new_table_name>’;

Shoaib Mir
shoaibmir[@]gmail.com

Categories: PostgreSQL Tags: , ,