MySQL in Operator Not Using Index

After being alerted for elevated cpu on a staging mysql instance, I logged into the mysql console and ran a SHOW PROCESS LIST :

SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 1
   User: my_user
   Host: x.x.x.x:xxx
     db: my_db
Command: Sleep
   Time: 2
  State:
   Info: NULL
*************************** 2. row ***************************
     Id: 3
   User: my_user
   Host: x.x.x.x:xxx
     db: my_db
Command: Query
   Time: 786
  State: Sending data
   Info: SELECT file_id, fs_file_id
FROM my_data
WHERE file_id IN ('123456', 123457);

A query running for 786 seconds quickly caught my eye, my next course of action was to run an EXPLAIN :


EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', 123457)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: my_data
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 131144420
        Extra: Using where
1 row in set (0.00 sec)

The EXPLAIN confirmed my suspicions, this query was not using an index , and ultimately caused a full table scan .

I was expecting to not find an index for the file_id field, but as you can see this field has an index by being a PRIMARY KEY :


SHOW CREATE TABLE my_data\G
*************************** 1. row ***************************
Table: my_data
Create Table: CREATE TABLE `my_data` (
`file_id` int(10) unsigned NOT NULL,
`fs_file_id` binary(16) DEFAULT NULL,
PRIMARY KEY (`file_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

Something I didn’t immediately identify earlier was the mixed types in my IN operator ’s list, they contained both integers and strings :


file_id IN ('123456', 123457)

Once I noticed the mixed type I began debugging, I finally learned the IN operator works great with sets of integers :


EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN (123456, 123457)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: my_data
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

It even works with sets of strings :


EXPLAIN SELECT file_id, fs_file_id FROM my_data WHERE file_id IN ('123456', '123457')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: my_data
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

But as we observed from the begining, the IN operator doesn’t act as expected with mixed strings and integers.