避免索引失效

索引使用

设置正确的索引

譬如在我们的employees表中，有大概三十万行的数据

+--------+------------+------------+--------------+--------+------------+
| emp_no | birth_date | first_name | last_name    | gender | hire_date  |
+--------+------------+------------+--------------+--------+------------+
|  10001 | 1953-09-02 | Georgi     | Facello      | M      | 1986-06-26 |
...

我们想找到first_name为Chirstian的结果，下面比较一下加上索引前后查询所耗费的时间，从结果我们可以看出使用索引可以使查询效率提高20倍。

mysql> select * from employees where first_name = 'Chirstian';
mysql> alter table employees add index first_name (first_name);
mysql> select * from employees where first_name = 'Chirstian';

mysql> SHOW PROFILES;
+----------+------------+---------------------------------------------------------+
| Query_ID | Duration   | Query                                                   |
+----------+------------+---------------------------------------------------------+
|        1 | 0.17415400 | select * from employees where first_name = 'Chirstian'  |
|        2 | 1.03130100 | alter table employees add index first_name (first_name) |
|        3 | 0.00869100 | select * from employees where first_name = 'Chirstian'  |
+----------+------------+---------------------------------------------------------+

遵循最左前缀匹配原则

还拿前面的employees表举例，比如我们建立一个(birth_date, first_name, last_name ) 的组合索引。

mysql> alter table employees add index bd_fn_ln (birth_date, first_name, last_name);

下面的查询是用到索引的：

mysql> select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian' and last_name = 'Koblick';

mysql> select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian';

mysql> select * from employees where birth_date = '1954-05-01' and last_name = 'Koblick';

下面是这三个查询explain结果。

mysql> explain select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian' and last_name = 'Koblick';

+----+-------------+-----------+------+---------------+----------+---------+-------------------+------+-----------------------+

| id | select_type | table     | type | possible_keys | key      | key_len | ref               | rows | Extra                 |

+----+-------------+-----------+------+---------------+----------+---------+-------------------+------+-----------------------+

|  1 | SIMPLE      | employees | ref  | bd_fn_ln      | bd_fn_ln | 97      | const,const,const |    1 | Using index condition |

+----+-------------+-----------+------+---------------+----------+---------+-------------------+------+-----------------------+

1 row in set (0.00 sec)

mysql> explain select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian' ;

+----+-------------+-----------+------+---------------+----------+---------+-------------+------+-----------------------+

| id | select_type | table     | type | possible_keys | key      | key_len | ref         | rows | Extra                 |

+----+-------------+-----------+------+---------------+----------+---------+-------------+------+-----------------------+

|  1 | SIMPLE      | employees | ref  | bd_fn_ln      | bd_fn_ln | 47      | const,const |    1 | Using index condition |

+----+-------------+-----------+------+---------------+----------+---------+-------------+------+-----------------------+

1 row in set (0.01 sec)

mysql> explain select * from employees where birth_date = '1954-05-01' and last_name = 'Koblick';

+----+-------------+-----------+------+---------------+----------+---------+-------+------+-----------------------+

| id | select_type | table     | type | possible_keys | key      | key_len | ref   | rows | Extra                 |

+----+-------------+-----------+------+---------------+----------+---------+-------+------+-----------------------+

|  1 | SIMPLE      | employees | ref  | bd_fn_ln      | bd_fn_ln | 3       | const |   60 | Using index condition |

+----+-------------+-----------+------+---------------+----------+---------+-------+------+-----------------------+

1 row in set (0.00 sec)

虽然结果都是一条，不过前两个查询都用到了联合索引。最后一个只用到了birth_date这一个索引，所以会在birth_date = 1954-05-01 的60结果中遍历last_name来找到等于Koblick的结果。还有,如果where中都是精确匹配(使用’=‘号)，那它们的顺序不会影响索引的使用。

而下面这个查询因为没用到组合索引的最左列，所以不会用到索引而是遍历了所有的数据，这就是最左前缀匹配：

mysql> select * from employees where first_name = 'Chirstian' and last_name = 'Koblick';

+--------+------------+------------+-----------+--------+------------+

| emp_no | birth_date | first_name | last_name | gender | hire_date  |

+--------+------------+------------+-----------+--------+------------+

|  10004 | 1954-05-01 | Chirstian  | Koblick   | M      | 1986-12-01 |

+--------+------------+------------+-----------+--------+------------+

1 row in set (0.18 sec)

mysql> explain select * from employees where first_name = 'Chirstian' and last_name = 'Koblick';

+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+

| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows   | Extra       |

+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+

|  1 | SIMPLE      | employees | ALL  | NULL          | NULL | NULL    | NULL | 299468 | Using where |

+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+

1 row in set (0.00 sec)

不要在比较运算符左侧使用函数或进行计算

在sql语句的比较运算符左侧使用函数或进行计算会使索引失效。


mysql> explain select * from employees where emp_no + 1 = 10005;
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | employees | ALL  | NULL          | NULL | NULL    | NULL | 299468 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.11 sec)

mysql> explain select * from employees where emp_no = 10005-1;
+----+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | employees | const | PRIMARY       | PRIMARY | 4       | const |    1 | NULL  |
+----+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

如果对索引字段进行了任何的表达式运算，那么其都会使索引功能失效，这是因为索引始终是一个B+树，使用其进行索引片过滤的时候是通过二分查找实现的，如果进行了计算，那么就无法使用二分查找功能，也就使得索引失效了。

-- 对索引字段使用MySQL函数（可以对等于号后的值使用，不能对字段使用）
mysql> select * from actor where lower(first_name)='rmqchuezjthp’; -- 错误
mysql> select * from actor where first_name='rMqChueZJThP'; -- 正确

-- 隐式字符串转换（这里license字段为一个varchar类型字段）
mysql> select * from actor where license=6535; -- 错误
mysql> select * from actor where license='6535'; -- 正确

-- 对索引字段使用数学表达式
mysql> select * from actor where hash_email + 2 = 4224712734; -- 错误
mysql> select * from actor where hash_email = 4224712732; -- 正确

困难谓词

不等式

mysql> select A, B, C, D from TABLE where A=a and B>b and C>c;

于上述查询，推荐使用索引 (A, B, C, D)或(A, C, B, D)，如果B和C字段的选择性，哪一个高就将其放在索引字段的前面。对于索引(A, B, C, D)，在查询时首先会根据字段A的等值条件和B的不等值条件进行索引片的过滤，然后扫描索引中B字段大于b的数据，在扫描过程中会判断获取到的数据是否满足C>c的条件，并且将符合条件的数据的D字段的值取出来，最后得到的结果集就是最终的结果集。该查询中A和B字段是参与了索引片的过滤的，而C和D字段则参与了索引覆盖扫描。

OR谓词

mysql> select A, B, C from TABLE where A>a or B>b;

推荐索引(A)和(B)，对于OR条件查询，由于并不是满足其中一个条件即可，而是两个条件只要满足一个即可。这里推荐索引为建立两个单列索引(A)和(B)，因为MySQL可以通过这两个索引进行“索引合并扫描”，也就是其首先会扫描索引A，获取其符合A>a条件的数据id，然后扫描索引B，获取其符合B>b的数据id，然后将两个扫描结果进行合并，最后通过合并的数据id在磁盘上读取数据。

对于OR谓词的索引合并扫描需要说明的是，如果需要合并的结果集非常大，或者是结果集中重复数据过多，那么进行结果集的合并将是一个非常耗时的操作，有时候效率还不及全表扫描。对于这个问题的另一个解决办法就是新建一个字段，取值1和0，标识其是否符合where条件，这样就只需要对该字段进行查询即可，也可以建立相关的索引。

IN谓词

mysql> select A, B, C from TABLE where A in (m, n, p) and B=b;

推荐索引：(A, B)，这里IN谓词严格意义上讲不是一个困难谓词，放在这里是为了借用OR谓词的索引合并扫描进行说明。对于IN谓词后的列表，MySQL会循环列表中的数据，然后分别于后续索引字段联合，比如对于上述查询，其可以拆分为(A=m and B=b) union (A=n and B=b) union (A=p and B=b)。拆分之后MySQL会首先根据A=m and B=b扫描联合索引(A, B)，获取结果集的id，然后根据A=n and B=b再次扫描该索引，依次循环，知道所有IN列表条件都扫描完成。由于IN列表条件是不重复的，因而最后扫描索引片也是不重复的，在进行结果集的合并的时候也就没有类似OR谓词的去重操作，因而查询效率非常的高。总结来说，IN谓词及其后续字段是可以使用到索引的。

最近更新于0001-01-01