MySQL in Operator Not Using Index
After being alerted for elevated cpu on a staging mysql instance, I logged into the mysql console and ran a SHOW PROCESS LIST :
SHOW PROCESSLIST\G
*************************** 1. row ***************************
Id: 1
User: my_user
Host: x.x.x.x:xxx
db: my_db
Command: Sleep
Time: 2
State:
Info: NULL
*************************** 2. row ***************************
Id: 3
User: my_user
Host: x.x.x.x:xxx
db: my_db
Command: Query
Time: 786
State: Sending data
Info: SELECT file_id, fs_file_id
FROM my_data
WHERE file_id IN ('123456', 123457);
A query running for 786 seconds quickly caught my eye, my next course of action was to run an EXPLAIN :
EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', 123457)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: my_data
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 131144420
Extra: Using where
1 row in set (0.00 sec)
The EXPLAIN confirmed my suspicions, this query was not using an index , and ultimately caused a full table scan .
I was expecting to not find an index for the file_id field, but as you can see this field has an index by being a PRIMARY KEY :
SHOW CREATE TABLE my_data\G
*************************** 1. row ***************************
Table: my_data
Create Table: CREATE TABLE `my_data` (
`file_id` int(10) unsigned NOT NULL,
`fs_file_id` binary(16) DEFAULT NULL,
PRIMARY KEY (`file_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
Something I didn’t immediately identify earlier was the mixed types in my IN operator ’s list, they contained both integers and strings :
file_id IN ('123456', 123457)
Once I noticed the mixed type I began debugging, I finally learned the IN operator works great with sets of integers :
EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN (123456, 123457)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: my_data
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where
1 row in set (0.00 sec)
It even works with sets of strings :
EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', '123457')\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: my_data
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where
1 row in set (0.00 sec)
But as we observed from the begining, the IN operator doesn’t act as expected with mixed strings and integers.