Generated ColumnsIn this post, we’ll look at how to improve queries using generated columns and ProxySQL instead of implementing a referenced table.

Developers and architects don’t always have the time or complete information to properly analyze and design a database. That is why we see tables with more fields than needed, or with incorrect types. The best solution is implementing a change in the database schema and/or application level. In this post, we’ll look at an example of generated columns (using a char field) instead of creating a referenced table, and how using generated columns and ProxySQL avoids changes at the application level.

For this example, I will be using the film table of the Sakila database (with some changes). The original film table had a language_id as tinyint, which refers to the language table:

I simplified the design of the table and added the language field as a char(20), as it is in table language:

We inserted 1M records and executed a query that filters by language:

The explain plan shows that it is performing a full table scan. Adding an index over language speeds up the query from 0.92 sec to 0.09 sec:

And the explain plan shows that it is using the language index:

Now, we are going to add a generated column and an index over this new field:

Finally, we are going to add a rule in ProxySQL with the replace pattern:

Now, when we execute the query through ProxySQL, it will replace:

With:

The explain plan shows it is using the language_id index:

And that the query time decreases to 0.06 sec:

This speed up (from 0.09 to 0.06) on the select is because it uses a smaller index, as you can see in the key_len of the explain plan. Performing a range scan over entries of 64 bytes is slower than performing a range scan over entries of 8 bytes (the amount of data to read is less). For instance, the index_length with just the language index is 36274176 (34.6MB), while over the language_id is 14172160 (13.5MB).

Conclusion

Generated columns are not a long-term solution. The long-term solution is to add a table, or have the application internally use a referenced table. However, I found it particularly useful or practical in two scenarios:

  1. When database performance is dropping and change at the application level is not a priority for the business. This effort can be implemented by the DBA team without any other intervention.
  2. When you must show the performance improvement that the change will make before modifying anything at the application level.

Cons, you need to install ProxySQL (if you are testing, you can install it just on the test server), which adds a new layer (more complex) to your infrastructure (you need to use MySQL 5.7 at least, as well). Another thing to take into account is that the writes on the table will be a bit slower because of the newly generated columns and the new indexes.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
jonathan

I think you have an error here:
SELECT count(*) FROM film WHERE language_id = (
CASE ‘Mandarin’ <– should be language, no?
WHEN 'English' THEN 1
WHEN 'Italian' THEN 2
WHEN 'Japanese' THEN 3
WHEN 'Mandarin' THEN 4
ELSE 0
END
)