This article is written in English and Portuguese
Este artigo está escrito em Inglês e Português


English version:


Introduction

The topic of in place ALTERed tables has always been present in Informix. We say a table was ALTERed in place when an ALTER TABLE instruction resulted in a new table schema, but the table was not physically changed. This is very helpful on large and busy tables. It means you can do the ALTER TABLE quickly (instantaneously) and with minimum immediate impact on system resources. Internally Informix does a very simple thing from the user point of view, but that can be complex from the engine point of view: A new version of the table definition is created, and from there on, any instruction (DML) affecting the data will use the new version. The table's existing pages are left on the older version but each time we SELECT them, the engine converts the row(s) to the new format. Additionally if we update a row, the page where it is stored will be written in the new format.

For example, if we have a table with lots of data pages, and we add a column, this will do an in place ALTER. Certain types of ALTERs don't allow in place ALTERs. For example, if we change a CHAR(5) column to a SMALLINT, this will be a slow ALTER. This usually happens when the system cannot immediately guarantee that the existing data can be stored in the new representation.

When the engine decides that it can do an in place ALTER, we don't have the option to inhibit it. Meaning the ALTER table will be done as an in place ALTER, and if we want to force the physical changes, we must do what we usually call a dummy UPDATE: UPDATE tabname SET column = column;

Impacts of in place ALTERs

There are several impacts on having tables with in place ALTERs. The immediate one is that your updates will be slightly slower, since the whole page will be converted and written. Note that this does not necessarily means higher I/O load, since the page is the smallest unit written by the engine. In other words, even if your table does not have more than one version of it's definition, when we change a single row of data, a whole page (at least) will be written. But the other rows of the page don't need to be changed. And changing an older version can mean your data will not fit on one page after conversion. So there can be more I/O.
Another potential issue is that like with any other part of the engine, the mechanism of in place ALTER can have issues. This is an overrated aspect, but it's true we've seen problems that only affect tables with in place ALTERs.

But the real issue with in place ALTERs refers to upgrades and downgrades. Through versions there have been great controversy regarding the real impact of doing in-place upgrades (conversions) with tables with pending in-place ALTERs. I've searched through the migration guides for several versions (7.3, 9.2, 9.4, 10, 11.1, 11.5 and 11.7) and they're almost completely consistent: You can upgrade with pending in-place ALTERed tables, but you cannot revert. The only exception I've found was the manual for 9.4 which states that you must remove the pending in-place ALTERs to run the conversion (upgrade) successfully.

The reason for allowing upgrades but not downgrades is pretty simple and acceptable: Informix guarantees that version N+1 can handle all previous situations of in-place ALTERs done in version N or older. But, since each version may have added new situations where in-place ALTERs can be done, we can't risk porting a pending ALTER to an older version that may not be able to handle it. Let me remind you that an in-place ALTER requires the ability to convert from one row version to a newer one (which can have more columns, or different data types etc.)

At the time of this writing I could not verify if the exception in the migration guide of version 9.4 was justified or simply a documentation problem. But the fact is that most people assume they must complete the ALTER tables that were done in-place before converting. By this I mean that they must eliminate all the pages in older versions. A valid reason to do that is that in case you need to revert you don't have to waste time running the dummy updates. Typically, if you have any issue in the new version and need to revert, you'll want to do it as soon as possible. As such, eliminating the pending inplace ALTERs before you upgrade can save you precious time if you really need to revert. In any case, the migration guide for version 11.7 clearly states that you only need to remove the pending inplace ALTERs, if they were caused after the conversion to 11.7. Any previous one (which already exists in the old version) would not need to be eliminated.
As a side note, let me state that the only time I've done reversions was during a customer and partner workshop where we were demoing this functionality... I never had to do it on a real customer situation. In any case the other limitations for reversion can be real challenges, so the pending in-place ALTERs wouldn't be your biggest concern.

Finding in-place alters

Now that we've seen in which situations we should remove pending inplace ALTERs (tables with pages in older versions), we come to another issue that is frequently asked, and to which there are several answers. Most of them are controversial or badly explained which again raises a lot of confusion. The issue is: How do we find tables with pending inplace ALTERs?
There was a discussion in the IIUG forum near the end of 2010 that focused on this issue. As usual, there were three answers to this:
  1. A quick way (based on SQL and SMI that tells us which tables suffered an inplace ALTER, but doesn't show which tables have pending inplace ALTERs. This means that unless you completely rebuild the table, just running dummy UPDATEs will not prevent the table from always appearing in the list generated by this method
  2. A slow way that's based in the oncheck -pT output (this effectively tells you the number of pages in each existing version This method will give you just the tables with pending inplace ALTERs
  3. A technical support tool that searches the tables metadata and can provide the answer pretty quickly. Only problem is that it's not generally available

And this was my motivation to do some research on this issue. After some information exchange with Andreas Legner from IBM technical support in Germany, I was able to create an SQL script that can find the tables with pending inplace alters. The script can give us the details about the number of pages in each version, it returns the database and table name, the version and number of pages it contains.

The great thing about this script is that it's fast (from a few seconds to a few minutes for very large databases) and it really shows you the current status. Contrary to what the option 1) above does, if you do the dummy updates on one table, that table will not show up if you run it again. One warning: If you test this, after the dummy update you need to force a checkpoint. The script goes through the partition headers (because the required info is stored there), and after the dummy updates the partition headers are written to disk only at checkpoint time.

The script comes in the form of a stored procedure and is based only on the sysmaster views. The script was tested with all the versions I could find (7.31, 9.3, 9.4, 10, 11.1, 11.5 and 11.70) and it worked without issues in all of them. So if you're upgrading an old system and want to make sure you clear any pending inplace altered tables this can be a great help.

The SQL containing the script is at the bottom of this article and I will not dig into it with great detail. The challenges I got in developing it were mainly to understand how the needed data was already present in sysmaster views and also on interpreting this data (the representation is different depending on the "endianess" of your platform. Again, for the first one the help from Andreas Legner was precious and for the second one a special thank you goes to Art Kag...

Author: This e-mail address is being protected from spambots. You need JavaScript enabled to view it (Fernando Nunes)


{mosgoogle}

{mos_sb_discuss:4}


 
Newsflash
IBM intends to offer NoSQL database capabilities inside of DB2 beginning early next year, according to a keynote speaker at IBM Information On Demand 2011.


Read more...
 

Title: Best Practices for Embedded Data Management
Date: Thursday, May 24, 2012
Time: 10:30 AM - 12:00 PM (Central Time), 16:30h Germany
Presented By: Pradeep Natarajan - Software Developer (IBM)
Registration: Best Practices for Embedded Data Management

Description:

 
A small footprint, a silent i...
Read more...
 
Ads