I had always been building Slony from source, but for the first time I tried using the community version from “pgrpms.org”. I was using PostgreSQL version 8.3 so it was really easy to find the correct rpms at http://www.pgrpms.org/8.3/redhat/rhel-5-x86_64/ but as I tried installing Slony package I was hit with an error:
"libpq.so.4 not found"
This looked really weird to me as I already had the libpq library installed as libpq.so.5 but it was still looking for libpq.so.4.
Looking for a solution I started going through the 8.3 RPMs and tried looking for file libpq.so.4 file…. bingooo!!! I found one which had it. compat-postgresql-libs was the answer to this problem.
After installing that package, Slony got installed successfully without any problems.
Just been away from blogging for quite a while now as work had been keeping me extremly busy but finally got sometime today to blog about an interesting problem I saw with Slony yesterday…
While at a customer site, I upgraded them to Slony version 1.2.21 and also during the process re-initialized the cluster. The whole process was done without any problem but right after everything was synched across, one of the Sysadmins saw this error in the database server log file:
"cache lookup failed for type 27889"
On doing an INSERT into sl_log_1 table.
Having never seen such a problem before with Slony, it got me a little worried as this was a production database so we had to fix it very quickly.
After doing a little bit of research found out that, by simply restarting the application server which is making connections to the database server will solve the problem. Reason being, the application server had already established a connection to database when we were dropping and re-creating the cluster and that is why it was still looking for old values that were sitting in the cache. Once we restarted the application server, it was all fine.
While trying to use a Python script today I came across this:
/usr/local/lib/python2.6/site-packages/londiste/repair.py:73: DeprecationWarning: os.popen4 is deprecated. Use the subprocess module. s_in , s_out = os.popen4("sort –version")
The troubling code:
s_in, s_out = os.popen4("sort --version")
Now this is because with Python 2.6 and above ‘popen4′ is something that is a depreciated feature, which means it will be better to change it to use ‘subprocess’ in order to get rid of the warning message.
A replacement to that is by using ‘subprocess’ as…
p = subprocess.Popen("sort --version", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True) (s_in, s_out) = (p.stdin, p.stdout)
For using subprocess, you will have to add this as well:
Using subprocess instead of popen does the same thing and gets rid of the annoying warning.
From the Linux man page:
“The iostat command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates. The iostat command generates reports that can be used to change system configuration to better balance the input/output load between physical disks.”
Reports that we get from ‘iostat’ are really useful but I myself had a little bit of trouble when trying to interpret the results while using the the first time, but since then its my preferred go-to tool when trying to debug disk overloads.
I usually use the iostat command with the following switches:
iostat –d –x <interval>
-d = gets rid of the CPU stats so that we can easily concentrate on the I/O only
-x = some additional info like ‘await’ and ‘svctm’ (will discuss them later)
<interval> = this is time in seconds, so every number of <interval> seconds you will get a new ‘iostat’ report
Let’s now see a sample output of ‘iostat’:
If we look at stats above usually we would look at %util and if we see close to 100% it can identify the problem for a single disk setup, but not in a usual multi-disks scenario.
Columns that we look at it in order to identify the problem will be:
syvctm: The average service time (in milliseconds) for I/O requests that were issued to the device
await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
This basically means:
await = syvctm + wait time in queue
Now using the above we can have a basic rule to identify an overloaded setup:
…if you can see a lot of difference in values for ‘syvctm’ and ‘await’ every now and then, that can tell you about I/O requests being going into long waits and this should help you identify the problem.
Slony (http://slony.info/) is great for high availability/load balancing and I have been recommending it to users for years now and haven’t really seen any major problems with it if you have got proper checks on the whole replication setup using proper monitoring with something like Nagios.
But well at times there is this annoying thing that always gave me trouble when a user goes and just runs a TRUNCATE on one of the master tables in order to do maintenance and all of the sudden you start getting errors like this on the slave nodes:
2010-06-01 11:02:09 ESTERROR remoteWorkerThread_1: "insert into "public"."table1" ("a","b") values ('1','one');
" ERROR: duplicate key value violates unique constraint "table1_pkey"
The reason behind this is…
- You did a truncate on a table at master node and assumed that this statement is all replicated to slaves as well
- Truncate event is not triggered like INSERT/UPDATE/DELETE (For PostgreSQL < 8.4) so that means the slave never got it and they still have the old copy of the table
- On master the table is now empty but the slave table has old records, which means on inserting new data you might start getting duplicate key violation error
I have used the following approach in the past to solve it…
Do a TRUNCATE on slaves for the same table and you will then see all those INSERTS/DELETES/UPDATES going through in the Slony logs file that have been queued up since Slony started getting these duplicate key violation problems
In order to avoid these scenarios, you have to keep on checking Slony lag times from sl_status of replication schema and alert if it goes above the dangerous lag limit. If it does, go and check the Slony logs as they usually tell the detailed story.
Another option to avoid doing TRUNCATE is by using table partitioning, which means when you need to drop old data just get rid of the specific partition and drop it out of the Slony cluster as well instead of doing a TRUNCATE. This way it will save you a lot of database routine maintenance like vacuuming as well because you are not deleting data using DELETEs but you are now dropping the whole partition.
I have recently seen trigger support for PostgreSQL 8.4 and above but I am not sure if it’s been implemented in Slony yet or what exactly is the plan on that, but it will be great to have that with Slony. I do have seen Bucardo (http://blog.endpoint.com/2009/07/bucardo-and-truncate-triggers.html) supporting truncate with triggers though.
While working with users on custom auto-vacuum settings, one of the real problems I saw with PostgreSQL 8.3 and below was making them understand the ways to update pg_autovacuum as sometimes little mistakes could cause you a lot of trouble.
I had some annoying experiences with those column values for pg_autovacuum at times as well, like once I had mistakenly autovacuum_freeze_min_age value set to zero while autovacuum_enabled set to false the auto-vacuum thread still kept doing vacuuming on that table. After doing a little bit of research found out that it had to be set to -1 so it can use the system-wide values and then it would all work. This was all a little confusing for the new users and then the other problem was as you a dump and restore of the database, all those settings are just gone so you will have to do it all over again.
I had been using pg_autovacuum with version 8.3 and just today while I was writing an upgrade plan for a client upgrading databases to 8.4, I just came across this improved custom auto-vacuum feature which lets you simply do a CREATE/ALTER TABLE command and set anything as you like for auto-vacuuming per table using the WITH option and it will also copy all these settings in the dump file (pg_dump output) as well which makes my life so easy while moving databases.
An example of it can be:
postgres=# \d test Table "public.test" Column | Type | Modifiers --------+-------------------+----------- a | integer | not null b | character varying | Indexes: "test_pkey" PRIMARY KEY, btree (a) postgres=# ALTER table test SET (autovacuum_vacuum_threshold=50, autovacuum_analyze_threshold=50); ALTER TABLE -- -- PostgreSQL database dump -- CREATE TABLE test ( a integer NOT NULL, b character varying ) WITH (autovacuum_vacuum_threshold=50, autovacuum_analyze_threshold=50); -- -- PostgreSQL database dump complete --
You can find details for the storage settings that can be set with the CREATE/ALTER TABLE at http://www.postgresql.org/docs/8.4/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
A friend at work asked for setting up Joomla as a Wiki so that the team can have a centralized place to share stuff. I had never done the Joomla setup (but had heard a lot about it at different conferences) before so was a little bit worried that it might chew up a lot of my time while doing the setup (playing around with configuration files and DB setup) as I was busy with some other critical tasks.
So finally when I got the time, I logged into a fresh Ubuntu VM that was already setup and started with installing the pre-requisite packages first which were:
– MySQL Server (it was 5.1 I guess)
– php-mysql (php connector for MySQL)
– PhpMyAdmin (in order to do MySQL database administration)
Which was all with a simple ‘apt-get’ command.
Next I downloaded the latest stable release of Joomla from http://www.joomla.org/ which was version 1.5.17. Unzipped it into /var/www/joomla folder (web folder for apache) and just pointed my browser at:
….and I was surprised to see a really nice interface for installation which will do the initial setup all automatically for you that includes setting up database tables, configs and admin user. Installation was a very simple seven step process and then I was told to get rid of the installation folder and start using my site… easyyy!!
Next thing for me was to find a way so that I can setup a Wiki in there, so I logged in as admin user to:
…and was presented with a very nice control panel for administrator and found the “Article Manager”. It was really simple to start adding articles for the Wiki in there, I added a few for for the demo in there and I could see them in the Joomla site (Wiki) right away which was just sweeeeeet! as the whole thing from setup to having a doc in Wiki was all done in just a few minutes!
Then came another requirement as I was told that we need to do attachments with the articles, I did a little bit of research and found a really nice extension for Joomla which is called “Attachments for Joomla” and can be found at –> http://extensions.joomla.org/extensions/directory-a-documentation/downloads/3115
Installation for this extension was just pointing the front-end to the downloaded extension through the administrator control panel…. so just with a click we now have this new functionality where users can add attachments as well for Wiki articles.
The dudes who wanted to use it are all happy with the setup and I am just so impressed with the simple and quick install steps that I will be recommending it to people now :)