After being alerted for elevated cpu on a staging mysql instance, I logged into the mysql console and ran a SHOW PROCESS LIST:
SHOW PROCESSLIST\G *************************** 1. row *************************** Id: 1 User: my_user Host: x.x.x.x:xxx db: my_db Command: Sleep Time: 2 State: Info: NULL *************************** 2. row *************************** Id: 3 User: my_user Host: x.x.x.x:xxx db: my_db Command: Query Time: 786 State: Sending data Info: SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', 123457);
A query running for 786 seconds quickly caught my eye, my next course of action was to run an EXPLAIN:
EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', 123457)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: my_data type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 131144420 Extra: Using where 1 row in set (0.00 sec)
The EXPLAIN confirmed my suspicions, this query was not using an index, and ultimately caused a full table scan.
I was expecting to not find an index for the file_id field, but as you can see this field has an index by being a PRIMARY KEY:
SHOW CREATE TABLE my_data\G *************************** 1. row *************************** Table: my_data Create Table: CREATE TABLE `my_data` ( `file_id` int(10) unsigned NOT NULL, `fs_file_id` binary(16) DEFAULT NULL, PRIMARY KEY (`file_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 1 row in set (0.00 sec)
Something I didn’t immediately identify earlier was the mixed types in my IN operator‘s list, they contained both integers and strings:
file_id IN ('123456', 123457)
Once I noticed the mixed type I began debugging, I finally learned the IN operator works great with sets of integers:
EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN (123456, 123457)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: my_data type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 2 Extra: Using where 1 row in set (0.00 sec)
It even works with sets of strings:
EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', '123457')\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: my_data type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 2 Extra: Using where 1 row in set (0.00 sec)
But as we observed from the begining, the IN operator doesn’t act as expected with mixed strings and integers.