Optimizing SQL with LIKE expression – 2

In my previous post I introduced 4 number of cases about how to optimize SQL with LIKE expression on oracle database. Even if I thought that I didn’t completely extend some more important details to describe/discuss those optimization skills particularly in previous case 2 and 4 as follows.

  • %qw
  • q%w

Now I am going to respectively talk about them. Here’s a demo quickly building a test table “person” on my oracle 19.3 environment (same as my previous post).

CREATE TABLE person
( id   NUMBER GENERATED ALWAYS AS IDENTITY
, name VARCHAR2(6) NOT NULL
, sex  VARCHAR2(1) NOT NULL
, flag VARCHAR2(1)
, pwd  VARCHAR2(6)
, CONSTRAINT person_pk PRIMARY KEY(id)
);

EXEC DBMS_RANDOM.seed(0);

INSERT /*+APPEND*/ INTO person (name, sex, flag, pwd)
SELECT DBMS_RANDOM.string('A', 6) AS name
     , CASE MOD(ROWNUM, 2) WHEN 0 THEN 'F'
                           WHEN 1 THEN 'M'
       END sex
     , CASE MOD(ROWNUM, 3) WHEN 0 THEN 'T'
                           WHEN 1 THEN NULL
                           WHEN 2 THEN 'F'
       END flag
     , DBMS_RANDOM.string ('p', 6) AS pwd
FROM dual
CONNECT BY level <= 3e5;

COMMIT;

EXEC DBMS_STATS.gather_table_stats(ownname => NULL, tabname => 'PERSON');

“%qw”

This case will be very interesting. I once mentioned that there is only the introduction of Reverse Key Index on oracle online docs rather than reverse() function. Since reverse() is an undocument function so we do a comparison between reverse key index and function-based index with reverse() – observeing which one is more efficient on the execution plan of these two different SQL queries.

reverse key index VS function-based index with reverse()

Of course, the 1st one is the case with reverse key index.

CREATE INDEX idx_person_name_reverse ON person (name) REVERSE;

SET LINESIZE 300
SET PAGESIZE 150
SET SERVEROUTPUT OFF
  
ALTER SESSION SET statistics_level = all;

SELECT *
  FROM person
 WHERE name LIKE '%qw'
;

SELECT *
  FROM table(DBMS_XPLAN.display_cursor(null, null, 'cost allstats last'))
;

-- Plan hash value: 1493655343
-- 
-- ---------------------------------------------------------------------------------------------------
-- | Id  | Operation         | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-- ---------------------------------------------------------------------------------------------------
-- |   0 | SELECT STATEMENT  |        |      1 |        |   327 (100)|    113 |00:00:00.06 |    1160 |
-- |*  1 |  TABLE ACCESS FULL| PERSON |      1 |  15000 |   327   (3)|    113 |00:00:00.06 |    1160 |
-- ---------------------------------------------------------------------------------------------------
-- 
-- Predicate Information (identified by operation id):
-- ---------------------------------------------------
-- 
--    1 - filter("NAME" LIKE '%qw')

SELECT /*+ index(person idx_person_name_reverse) */ *
  FROM person
 WHERE name LIKE '%qw'
;

SELECT *
  FROM table(DBMS_XPLAN.display_cursor(null, null, 'cost allstats last'))
;

-- Plan hash value: 3102595867
-- 
-- -----------------------------------------------------------------------------------------------------------------------------------------------
-- | Id  | Operation                           | Name                    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-- -----------------------------------------------------------------------------------------------------------------------------------------------
-- |   0 | SELECT STATEMENT                    |                         |      1 |        | 15757 (100)|    113 |00:00:00.20 |     875 |    753 |
-- |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON                  |      1 |  15000 | 15757   (1)|    113 |00:00:00.20 |     875 |    753 |
-- |*  2 |   INDEX FULL SCAN                   | IDX_PERSON_NAME_REVERSE |      1 |  15000 |   761   (1)|    113 |00:00:00.20 |     762 |    753 |
-- -----------------------------------------------------------------------------------------------------------------------------------------------
-- 
-- Predicate Information (identified by operation id):
-- ---------------------------------------------------
-- 
--    2 - filter("NAME" LIKE '%qw')

As you can see above those two SQL statements’ execution plan when adding the hint “/*+ index(person idx_person_name_reverse) */” on the original SQL althought optimizer chooses INDEX FULL SCAN using the reverse key index “idx_person_name_reverse” its cost value is 15757 that is apparently more expensive than 327 (the first SQL query using FTS).

The 2nd one is the case using FBI with reverse().

CREATE INDEX idx_person_name_reverse_function ON person (REVERSE(name));

SELECT *
  FROM person
 WHERE REVERSE(name) LIKE REVERSE('%qw')
;

SELECT *
  FROM table(DBMS_XPLAN.display_cursor(null, null, 'cost allstats last'))
;

Plan hash value: 525723249

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                             | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                                  |      1 |        |   321 (100)|    113 |00:00:00.01 |     124 |      2 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON                           |      1 |  15000 |   321   (1)|    113 |00:00:00.01 |     124 |      2 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NAME_REVERSE_FUNCTION |      1 |   2700 |     9   (0)|    113 |00:00:00.01 |      11 |      2 |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PERSON"."SYS_NC00006$" LIKE 'wq%')
       filter("PERSON"."SYS_NC00006$" LIKE 'wq%')

-- repeatedly running the previous SQL statement

Plan hash value: 525723249

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                             | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                                  |      1 |        |   122 (100)|    113 |00:00:00.01 |     124 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON                           |      1 |    113 |   122   (0)|    113 |00:00:00.01 |     124 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NAME_REVERSE_FUNCTION |      1 |    113 |     9   (0)|    113 |00:00:00.01 |      11 |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PERSON"."SYS_NC00006$" LIKE 'wq%')
       filter("PERSON"."SYS_NC00006$" LIKE 'wq%')

Note
-----
   - statistics feedback used for this statement

You might notice that I repeatedly run the previous SQL that is invoking reverse() on its WHERE clause as a result the cost value of its execution plan is 122 and optimizer luckily chooses INDEX RANGE SCAN using the function-based index with reverse()“idx_person_name_reverse_function”.

After finished comparing the execution plan of these two different types of index you’ve found that for oracle optimizer reverse key is doing too much work and too little benefit as well as fbi with reverse() is more efficient than reverse key nevertheless because of it’s undocumented you shouldn’t use it on the production environment.

One more thing, the fbi with reverse() can create an index column expression and reverse key index can’t. You can check their difference using an Oracle SDDV (Static Data Dictionary View) “user_ind_expressions”.

COLUMN index_name        FORMAT a35
COLUMN column_expression FORMAT a17

SELECT index_name
     , column_expression
FROM   user_ind_expressions
WHERE  table_name = 'PERSON'
;

INDEX_NAME                          COLUMN_EXPRESSION
----------------------------------- -----------------
IDX_PERSON_NAME_REVERSE_FUNCTION    REVERSE("NAME")

“q%w”

In this case I will use the two similar SQL queries to produce three different execution plan when creating two types of index. The following is the 1st one.

ALTER INDEX idx_person_name_reverse INVISIBLE;

ALTER INDEX idx_person_name_reverse_function INVISIBLE;

CREATE INDEX idx_person_name ON person (name);

COLUMN index_name FORMAT a35
COLUMN index_type FORMAT a22

SELECT index_name
     , index_type
     , visibility
     , clustering_factor
     , num_rows
FROM   user_indexes
WHERE  table_name = 'PERSON'
;

INDEX_NAME                          INDEX_TYPE             VISIBILIT CLUSTERING_FACTOR   NUM_ROWS
----------------------------------- ---------------------- --------- ----------------- ----------
PERSON_PK                           NORMAL                 VISIBLE                1148     300000
IDX_PERSON_NAME_REVERSE             NORMAL/REV             INVISIBLE            299731     300000
IDX_PERSON_NAME_REVERSE_FUNCTION    FUNCTION-BASED NORMAL  INVISIBLE            299731     300000
IDX_PERSON_NAME                     NORMAL                 VISIBLE              299764     300000

SELECT /*+ index(person idx_person_name) */ *
  FROM person
 WHERE name LIKE 'q%'
   AND name LIKE '%w'
;

SELECT *
  FROM table(DBMS_XPLAN.display_cursor(null, null, 'cost allstats last'))
;

Plan hash value: 1105672549

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |  5259 (100)|    114 |00:00:00.03 |     139 |     21 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON          |      1 |    262 |  5259   (1)|    114 |00:00:00.03 |     139 |     21 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NAME |      1 |   5244 |    16   (0)|    114 |00:00:00.02 |      25 |     21 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME" LIKE 'q%')
       filter(("NAME" LIKE 'q%' AND "NAME" LIKE '%w'))

-- repeatedly running the previous SQL statement

Plan hash value: 1105672549

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |   130 (100)|    114 |00:00:00.01 |     139 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON          |      1 |    114 |   130   (0)|    114 |00:00:00.01 |     139 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NAME |      1 |    114 |    16   (0)|    114 |00:00:00.01 |      25 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME" LIKE 'q%')
       filter(("NAME" LIKE 'q%' AND "NAME" LIKE '%w'))

Note
-----
   - statistics feedback used for this statement

As you can see above I’ve made both reverse key index and fbi with reverse() invisible meanwhile also created a normal index “idx_person_name” with single column “name” (on the prior highlighting line 22-24). As you can also see from that SQL statement I’ve forced optimizer to use the hint “/*+ index(person idx_person_name) */” as a result afterwards re-running that SQL query optimizer does a statistics feedback adjusting the cost, E-Rows and A-Rows to be apt (on the previous highlighting line 59, 61 and 72).

The 2nd one is that I’ll re-write the prior SQL statement (eliminating the preceding hint) – calling reverse() on the second WHERE clause “name LIKE ‘%w'”, such as, REVERSE(name) LIKE REVERSE(‘%w’). Take a closer look at its execution plan as follows.

SELECT *
  FROM person
 WHERE name LIKE 'q%'
   AND REVERSE(name) LIKE REVERSE('%w')
;

SELECT *
  FROM table(DBMS_XPLAN.display_cursor(null, null, 'cost allstats last'))
;

Plan hash value: 1105672549

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |   278 (100)|    114 |00:00:00.01 |     139 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON          |      1 |    262 |   278   (0)|    114 |00:00:00.01 |     139 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NAME |      1 |    262 |    16   (0)|    114 |00:00:00.01 |      25 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME" LIKE 'q%')
       filter(("NAME" LIKE 'q%' AND REVERSE("NAME") LIKE 'w%'))

-- repeatedly running the previous SQL statement

Plan hash value: 1105672549

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |   130 (100)|    114 |00:00:00.01 |     139 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON          |      1 |    114 |   130   (0)|    114 |00:00:00.01 |     139 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NAME |      1 |    114 |    16   (0)|    114 |00:00:00.01 |      25 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME" LIKE 'q%')
       filter(("NAME" LIKE 'q%' AND REVERSE("NAME") LIKE 'w%'))

Note
-----
   - statistics feedback used for this statement

You might find that above execution plan is very similar the 1st one. Yes, but their filter predicate is a bit different. Here optimizer is pretty smart to invoke reverse() in the background when choosing INDEX RANGE SCAN using the index “idx_person_name”.

P.S: Why optimizer doesn’t choose this index “idx_person_name_reverse_function” in the execution plan? Because I deliberately make it invisible in order to never be able to recognised by optimizer. Even if which is visible optimizer also won’t use it – if so, it is possible that optimizer think its cost is more expensive. Last Friday Jonathan Lewis reminds me a funny hint “index_combine(alias index1 index2 …)” on Twitter. Now we can assume that at this very moment the fbi with reverse() is visible we add the hint “/*+ index_combine(person idx_person_name idx_person_name_reverse_function) */” into the previous SQL statement, and then its execution plan won’t bring any change.

Trying my best to prompt optimizer to choose these two indexes “idx_person_name” and “idx_person_name_reverse_function” the only way we should do is creating a composite index with two parameters “name” and “reverse(name)”. This is my 3rd SQL query and its execution plan as below.

CREATE INDEX idx_person_nr ON person (name, REVERSE(name));

COLUMN index_name FORMAT a35
COLUMN index_type FORMAT a22

SELECT index_name
     , index_type
     , visibility
     , clustering_factor
     , num_rows
FROM   user_indexes
WHERE  table_name = 'PERSON'
;

INDEX_NAME                          INDEX_TYPE             VISIBILIT CLUSTERING_FACTOR   NUM_ROWS
----------------------------------- ---------------------- --------- ----------------- ----------
PERSON_PK                           NORMAL                 VISIBLE                1148     300000
IDX_PERSON_NAME_REVERSE             NORMAL/REV             INVISIBLE            299731     300000
IDX_PERSON_NAME_REVERSE_FUNCTION    FUNCTION-BASED NORMAL  INVISIBLE            299731     300000
IDX_PERSON_NAME                     NORMAL                 VISIBLE              299764     300000
IDX_PERSON_NR                       FUNCTION-BASED NORMAL  VISIBLE              299764     300000

SELECT *
  FROM person
 WHERE name LIKE 'q%'
   AND REVERSE(name) LIKE REVERSE('%w')
;

SELECT *
  FROM table(DBMS_XPLAN.display_cursor(null, null, 'cost allstats last'))
;

Plan hash value: 1070135284

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |      1 |        |    69 (100)|    114 |00:00:00.03 |     145 |     22 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON        |      1 |    262 |    69   (0)|    114 |00:00:00.03 |     145 |     22 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NR |      1 |     47 |    21   (0)|    114 |00:00:00.03 |      31 |     22 |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME" LIKE 'q%' AND "PERSON"."SYS_NC00006$" LIKE 'w%')
       filter(("NAME" LIKE 'q%' AND "PERSON"."SYS_NC00006$" LIKE 'w%'))

-- repeatedly running the previous SQL statement

Plan hash value: 1070135284

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |      1 |        |   135 (100)|    114 |00:00:00.01 |     145 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| PERSON        |      1 |    114 |   135   (0)|    114 |00:00:00.01 |     145 |
|*  2 |   INDEX RANGE SCAN                  | IDX_PERSON_NR |      1 |    114 |    21   (0)|    114 |00:00:00.01 |      31 |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME" LIKE 'q%' AND "PERSON"."SYS_NC00006$" LIKE 'w%')
       filter(("NAME" LIKE 'q%' AND "PERSON"."SYS_NC00006$" LIKE 'w%'))

Note
-----
   - statistics feedback used for this statement

As you can see from the previous execution plan optimizer certainly used INDEX RANGE SCAN using the index “idx_person_nr” (on the highlighting line 40 and 58). As you can also see from the prior predicate information optimizer does two steps of operation – firstly searching “q%” and secondly is “%w” whereas on “access” and “filter” section of that index.

Cheers, hope this help you again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.