This article is written in English and Portuguese
Este artigo está escrito em Inglês e PortuguêsEnglish version:
Spending a lot of time with customers is great. It gives me time to go beyond the strict task execution that short projects allow. We actually have time to dive deep into what sometimes looks only as a curiosity. I'll describe one of this situations in this post. It started out as a normal performance issue. A query involving a join was causing an hash join and apparently there was an index that could be used. Some quick investigation showed that the datatypes didn't match on the join columns, and as such the index was not being used.
The situation was fixed (using an explicit cast since changing the datatypes would require more deep analysis) and the programmer(s) received an explanation about the issue so that future situations would be avoided. Simple right? Yes... But when you have inquiring minds and programmers with a few minutes to spare you may be surprised. And this was the case. A few weeks later the DBA team received a request to explain why we recommended that the columns used in joins should have the same datatype. A programmer had produced a testcase where the engine was able to convert the parameter sent and use the index. In other words, if the engine is smart enough why should we care?!
Although this could be considered a waste of time (using the same datatypes or explicit cast is a good practice, right?!) the question was interesting enough to make us spend some time with it. In fact I had seen situations in the past where apparently sometimes the engine was smart, and others not. I never thought too much about it, since I always recommended to follow the best practices (which clearly saves us some troubles). So, personally I also had this doubt, and together with the customer DBAs we started to do some tests. We came up with a very simple test case that we though would show the problem:
DATABASE stores;
DROP TABLE IF EXISTS tst_int;
DROP TABLE IF EXISTS tst_char;
CREATE TABLE tst_int
(
c1 INTEGER,
c2 INTEGER
);
CREATE TABLE tst_char
(
c1 CHAR(15),
c2 INTEGER
);
INSERT INTO tst_int
SELECT FIRST 2000 pg_pagenum, pg_partnum FROM sysmaster:syspaghdr;
INSERT INTO tst_char
SELECT FIRST 2000 pg_pagenum, pg_partnum FROM sysmaster:syspaghdr;
CREATE INDEX i_tst_int ON tst_int(c1);
CREATE INDEX i_tst_char ON tst_char(c1);
SET EXPLAIN ON;
SELECT "tst_int with char parameter:", * FROM tst_int WHERE c1 = '12345678';
SELECT "tst_char whth int parameter",* FROM tst_char WHERE c1 = 12345678;
--- demonstrates that each index key is being casted
INSERT INTO tst_char VALUES("a1", 12345678);
SELECT * FROM tst_char WHERE c1 = 12345678;
If we take a look at the query plan we see:
QUERY: (OPTIMIZATION TIMESTAMP: 12-10-2011 23:25:19)
------
SELECT "tst_int with char parameter:", * FROM tst_int WHERE c1 = '12345678'
Estimated Cost: 1
Estimated # of Rows Returned: 1
1) informix.tst_int: INDEX PATH
(1) Index Name: informix.i_tst_int
Index Keys: c1 (Serial, fragments: ALL)
Lower Index Filter: informix.tst_int.c1 = 12345678
So here the engine was "smart". Meaning it converted the CHAR to an INTEGER and that allowed it to use the index. Nice.
But here:
QUERY: (OPTIMIZATION TIMESTAMP: 12-10-2011 23:25:19)
------
SELECT "tst_char whth int parameter",* FROM tst_char WHERE c1 = 12345678
Estimated Cost: 84
Estimated # of Rows Returned: 1
1) informix.tst_char: SEQUENTIAL SCAN
Filters: informix.tst_char.c1 = 12345678
It looks as it's not that smart... Instead of converting the INTEGER parameter to a CHAR and use the index it decides to do the opposite: Converts all the CHARs in that column into INTEGERs and makes a SEQUENTIAL SCAN.
Since I didn't have a good explanation for thi...
Find the whole article Optimizer secrets / segredos do optimizador on the website Informix technology.
{mosgoogle}
{mos_sb_discuss:4}
| < Prev | Next > |
|---|