Archive

Archive for June, 2010

Using subprocess for popen

June 16, 2010 1 comment

While trying to use a Python script today I came across this:

/usr/local/lib/python2.6/site-packages/londiste/repair.py:73: DeprecationWarning: os.popen4 is
deprecated.  Use the subprocess module.
s_in , s_out = os.popen4("sort –version")

The troubling code:

s_in, s_out = os.popen4("sort --version")

Now this is because with Python 2.6 and above ‘popen4’ is something that is a depreciated feature, which means it will be better to change it to use ‘subprocess’ in order to get rid of the warning message.

A replacement to that is by using ‘subprocess’ as…

p = subprocess.Popen("sort --version", shell=True, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
(s_in, s_out) = (p.stdin, p.stdout)

For using subprocess, you will have to add this as well:

import subprocess

Using subprocess instead of popen does the same thing and gets rid of the annoying warning.


Shoaib Mir
shoaibmir[@]gmail.com

import subprocess
Advertisements
Categories: Python Tags: , ,

Understanding ‘iostat’ output for database I/O loads

June 5, 2010 2 comments

From the Linux man page:

“The iostat command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates. The iostat command generates reports that can be used to change system configuration to better balance the input/output load between physical disks.”

Reports that we get from ‘iostat’ are really useful but I myself had a little bit of trouble when trying to interpret the results while using the the first time, but since then its my preferred go-to tool when trying to debug disk overloads.

I usually use the iostat command with the following switches:

iostat –d –x <interval>

Where…

-d = gets rid of the CPU stats so that we can easily concentrate on the I/O only

-x = some additional info like ‘await’ and ‘svctm’ (will discuss them later)

<interval> = this is time in seconds, so every number of <interval> seconds you will get a new ‘iostat’ report

Let’s now see a sample output of ‘iostat’:

If we look at stats above usually we would look at %util and if we see close to 100% it can identify the problem for a single disk setup, but not in a usual multi-disks scenario.

Columns that we look at it in order to identify the problem will be:

syvctm: The average service time (in milliseconds) for I/O requests that were issued to the device

await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

This basically means:

 await = syvctm + wait time in queue

Now using the above we can have a basic rule to identify an overloaded setup:

…if you can see a lot of difference in values for ‘syvctm’ and ‘await’ every now and then, that can tell you about I/O requests being going into long waits and this should help you identify the problem.


Shoaib Mir
shoaibmir[a]gmail.com

Categories: PostgreSQL Tags: , ,

TRUNCATE problems with Slony

June 3, 2010 5 comments

Slony (http://slony.info/) is great for high availability/load balancing and I have been recommending it to users for years now and haven’t really seen any major problems with it if you have got proper checks on the whole replication setup using proper monitoring with something like Nagios.

But well at times there is this annoying thing that always gave me trouble when a user goes and just runs a TRUNCATE on one of the master tables in order to do maintenance and all of the sudden you start getting errors like this on the slave nodes:

2010-06-01 11:02:09 ESTERROR  remoteWorkerThread_1: "insert into "public"."table1" ("a","b") values ('1','one');

" ERROR:  duplicate key value violates unique constraint "table1_pkey"

The reason behind this is…

  • You did a truncate on a table at master node and assumed that this statement is all replicated to slaves as well
  • Truncate event is not triggered like INSERT/UPDATE/DELETE (For PostgreSQL < 8.4) so that means the slave never got it and they still have the old copy of the table
  • On master the table is now empty but the slave table has old records, which means on inserting new data you might start getting duplicate key violation error

I have used the following approach in the past to solve it…

Do a TRUNCATE on slaves for the same table and you will then see all those INSERTS/DELETES/UPDATES going through in the Slony logs file that have been queued up since Slony started getting these duplicate key violation problems

In order to avoid these scenarios, you have to keep on checking Slony lag times from sl_status of replication schema and alert if it goes above the dangerous lag limit. If it does, go and check the Slony logs as they usually tell the detailed story.

Another option to avoid doing TRUNCATE is by using table partitioning, which means when you need to drop old data just get rid of the specific partition and drop it out of the Slony cluster as well instead of doing a TRUNCATE. This way it will save you a lot of database routine maintenance like vacuuming as well because you are not deleting data using DELETEs but you are now dropping the whole partition.

I have recently seen trigger support for PostgreSQL 8.4 and above but I am not sure if it’s been implemented in Slony yet or what exactly is the plan on that, but it will be great to have that with Slony. I do have seen Bucardo (http://blog.endpoint.com/2009/07/bucardo-and-truncate-triggers.html) supporting truncate with triggers though.


Shoaib Mir
shoaibmir[@]gmail.com

Categories: PostgreSQL Tags: , ,