Generated Columns and ProxySQL Instead of Referenced Tables

In this post, we’ll look at how to improve queries using generated columns and ProxySQL instead of implementing a referenced table.

Developers and architects don’t always have the time or complete information to properly analyze and design a database. That is why we see tables with more fields than needed, or with incorrect types. The best solution is implementing a change in the database schema and/or application level. In this post, we’ll look at an example of generated columns (using a char field) instead of creating a referenced table, and how using generated columns and ProxySQL avoids changes at the application level.

For this example, I will be using the film table of the Sakila database (with some changes). The original film table had a language_id as tinyint, which refers to the language table:

CREATE TABLE film (
film_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
title VARCHAR(255) NOT NULL,
description TEXT DEFAULT NULL,
release_year YEAR DEFAULT NULL,
language_id TINYINT UNSIGNED NOT NULL,
original_language_id TINYINT UNSIGNED DEFAULT NULL,
.....
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE film (

film_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,

title VARCHAR(255) NOT NULL,

description TEXT DEFAULT NULL,

release_year YEAR DEFAULT NULL,

language_id TINYINT UNSIGNED NOT NULL,

original_language_id TINYINT UNSIGNED DEFAULT NULL,

.....

)ENGINE=InnoDB DEFAULT CHARSET=utf8;

I simplified the design of the table and added the language field as a char(20), as it is in table language:

CREATE TABLE `film` (
`film_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` text,
`release_year` year(4) DEFAULT NULL,
`language` char(20) NOT NULL,
`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

CREATE TABLE `film` (

`film_id` int(10) unsigned NOT NULL AUTO_INCREMENT,

`title` varchar(255) NOT NULL,

`description` text,

`release_year` year(4) DEFAULT NULL,

`language` char(20) NOT NULL,

`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',

PRIMARY KEY (`film_id`),

KEY `idx_title` (`title`)

) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

We inserted 1M records and executed a query that filters by language:

mysql> select count(*) from film where language ='Mandarin';
+----------+
| count(*) |
+----------+
|   256000 |
+----------+
1 row in set (0.92 sec)

mysql> explain select count(*) from film where language ='Mandarin';
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys |  key | key_len |  ref |    rows | filtered |       Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 |      SIMPLE |  film |       NULL |  ALL |          NULL | NULL |    NULL | NULL | 1014813 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.02 sec)

mysql> select count(*) from film where language ='Mandarin';

+----------+

| count(*) |

+----------+

| 256000 |

+----------+

1 row in set (0.92 sec)

mysql> explain select count(*) from film where language ='Mandarin';

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+

1 row in set, 1 warning (0.02 sec)

The explain plan shows that it is performing a full table scan. Adding an index over language speeds up the query from 0.92 sec to 0.09 sec:

mysql> ALTER TABLE film ADD INDEX (language);
Query OK, 0 rows affected (4.87 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql> select count(*) from film where language ='Mandarin';
+----------+
| count(*) |
+----------+
|   256000 |
+----------+
1 row in set (0.09 sec)

mysql> ALTER TABLE film ADD INDEX (language);

Query OK, 0 rows affected (4.87 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> select count(*) from film where language ='Mandarin';

+----------+

| count(*) |

+----------+

| 256000 |

+----------+

1 row in set (0.09 sec)

And the explain plan shows that it is using the language index:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: new_film
   partitions: NULL
         type: ref
possible_keys: language
          key: language
      key_len: 60
          ref: const
         rows: 245136
     filtered: 100.00
        Extra: Using index

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: new_film

partitions: NULL

type: ref

possible_keys: language

key: language

key_len: 60

ref: const

rows: 245136

filtered: 100.00

Extra: Using index

Now, we are going to add a generated column and an index over this new field:

ALTER TABLE film ADD COLUMN language_id tinyint GENERATED ALWAYS AS (
CASE language
      WHEN 'English'  THEN  1
      WHEN 'Italian'  THEN  2
      WHEN 'Japanese' THEN  3
      WHEN 'Mandarin' THEN  4
      ELSE 0
END
) VIRTUAL ;

ALTER TABLE film ADD INDEX (language_id);

ALTER TABLE film ADD COLUMN language_id tinyint GENERATED ALWAYS AS (

CASE language

WHEN 'English' THEN 1

WHEN 'Italian' THEN 2

WHEN 'Japanese' THEN 3

WHEN 'Mandarin' THEN 4

ELSE 0

END

) VIRTUAL ;

ALTER TABLE film ADD INDEX (language_id);

Finally, we are going to add a rule in ProxySQL with the replace pattern:

replace into mysql_query_rules
(rule_id,match_pattern,replace_pattern,apply,active)
values
(3, 
"(.*)FROM(.*)film(.*)WHERE(.*)language ='(.*)'(.*)",
"1 FROM 2 film 3 WHERE 4 language_id = (
CASE '5'
      WHEN 'English'  THEN  1
      WHEN 'Italian'  THEN  2
      WHEN 'Japanese' THEN  3
      WHEN 'Mandarin' THEN  4
      ELSE 0
END
) 6",1,1); 
LOAD MYSQL QUERY RULES TO RUNTIME;

replace into mysql_query_rules

(rule_id,match_pattern,replace_pattern,apply,active)

values

(3,

"(.*)FROM(.*)film(.*)WHERE(.*)language ='(.*)'(.*)",

"1 FROM 2 film 3 WHERE 4 language_id = (

CASE '5'

WHEN 'English' THEN 1

WHEN 'Italian' THEN 2

WHEN 'Japanese' THEN 3

WHEN 'Mandarin' THEN 4

ELSE 0

END

) 6",1,1);

LOAD MYSQL QUERY RULES TO RUNTIME;

Now, when we execute the query through ProxySQL, it will replace:

SELECT count(*) FROM film WHERE language ='Mandarin';

1	SELECT count(*) FROM film WHERE language ='Mandarin';

With:

SELECT count(*) FROM film WHERE language_id = (
CASE 'Mandarin'
 WHEN 'English' THEN 1
 WHEN 'Italian' THEN 2
 WHEN 'Japanese' THEN 3
 WHEN 'Mandarin' THEN 4
 ELSE 0
END
)

SELECT count(*) FROM film WHERE language_id = (

CASE 'Mandarin'

WHEN 'English' THEN 1

WHEN 'Italian' THEN 2

WHEN 'Japanese' THEN 3

WHEN 'Mandarin' THEN 4

ELSE 0

END

)

The explain plan shows it is using the language_id index:

mysql> explain SELECT count(*) FROM film WHERE language ='Mandarin' G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: ref
possible_keys: language_id
          key: language_id
      key_len: 2
          ref: const
         rows: 507142
     filtered: 100.00
        Extra: Using index

mysql> explain SELECT count(*) FROM film WHERE language ='Mandarin' G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: film

partitions: NULL

type: ref

possible_keys: language_id

key: language_id

key_len: 2

ref: const

rows: 507142

filtered: 100.00

Extra: Using index

And that the query time decreases to 0.06 sec:

mysql> SELECT count(*) FROM film WHERE language ='Mandarin';
+----------+
| count(*) |
+----------+
|   256000 |
+----------+
1 row in set (0.06 sec)

mysql> SELECT count(*) FROM film WHERE language ='Mandarin';

+----------+

| count(*) |

+----------+

| 256000 |

+----------+

1 row in set (0.06 sec)

This speed up (from 0.09 to 0.06) on the select is because it uses a smaller index, as you can see in the key_len of the explain plan. Performing a range scan over entries of 64 bytes is slower than performing a range scan over entries of 8 bytes (the amount of data to read is less). For instance, the index_length with just the language index is 36274176 (34.6MB), while over the language_id is 14172160 (13.5MB).

Conclusion

Generated columns are not a long-term solution. The long-term solution is to add a table, or have the application internally use a referenced table. However, I found it particularly useful or practical in two scenarios:

When database performance is dropping and change at the application level is not a priority for the business. This effort can be implemented by the DBA team without any other intervention.
When you must show the performance improvement that the change will make before modifying anything at the application level.

Cons, you need to install ProxySQL (if you are testing, you can install it just on the test server), which adds a new layer (more complex) to your infrastructure (you need to use MySQL 5.7 at least, as well). Another thing to take into account is that the writes on the table will be a bit slower because of the newly generated columns and the new indexes.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

jonathan

6 years ago

I think you have an error here:
SELECT count(*) FROM film WHERE language_id = (
CASE ‘Mandarin’ <– should be language, no?
WHEN 'English' THEN 1
WHEN 'Italian' THEN 2
WHEN 'Japanese' THEN 3
WHEN 'Mandarin' THEN 4
ELSE 0
END
)

David Ducos

Author

Reply to jonathan

6 years ago

Hi Jonathan,
No, the query is correct. ProxySQL will change to the CASE statement, and that case will be simplified to 4, at the end, mysql will execute:
SELECT count(*) FROM film WHERE language_id = 4;
which is what we are looking for.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Generated Columns and ProxySQL Instead of Referenced Tables

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Generated Columns and ProxySQL Instead of Referenced Tables

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation