Chapter 7. DB

This chapter explains about DB and its peripheral technologies related to Rails.

RDB(SQL)

MySQL had most often used, second had PostgreSQL in Rails in my experience. Rails wraps most of the SQL, but it's never ok not to know SQL.

It is better to use SQL such as INSERT, UPDATE, DELETE, SELECT ~ FROM, INNER JOIN, LEFT JOIN, WHERE, GROUP, ORDER, LIMIT.

It is better to know SQL such as AS, HAVING, CREATE TABLE, ALTER, INDEX(PK, UNIQUE), FOREIGN KEY, VIEW, FUNCTION, TRIGGER, AUTO_INCREMENT, USER, ROLE, etc.

RDBs other than MySQL

Oracle, SQL Server, PostgreSQL, SQLite, etc.

I've never seen Oracle and SQL Server used in Rails, but this is probably largely due to the license fee.

SQLite is basically does not seem to be suitable for managing large data. It is often used to manage small data or smartphone applications.

I have seen some systems PostgreSQL is used.

Differences between MySQL and PostgreSQL in Rails

  • Index behavior is different.
  • PostgreSQL can use unique_by option of insert_all and upsert_all.
  • PostgreSQL can not use after option add_column. New columns will be added to the end of the table.

Others seem to include,

  • PostgreSQL is better suited for subqueries and large scale development (seems to have more features).
  • MySQL is relatively easier to set up and manage, and has more information.

However, I haven't noticed a big difference in Rails. About subqueries, it is incompatible with ActiveRecord in the first place, so I don't implement with subqueries very often.

Often used column naming conventions in Rails

  • Abbreviations are not often used such as _flg, _cd, _dt, etc. and avoid words that make sentences longer such as is_, can_ .
    • Flag columns are used with the past participle of the verb. And omit is_.
    • e.g. Use deleted instead of delete_flg. Also used deleted_at. In that case, use deleted_at.present? as a flag.
    • Use ~able instead of can_.
    • However, In some cases, can_ and has_ , etc. may be easier to understand if used.
  • Use _at for datetime and _on for date.
    • e.g. A column that manages the expiration time is often named expired_at. Also named expires_at if it is mainly has a future datetime.
  • The associated table name can be {singular of table name}_id to automatically used the column by Model association definition.

Habits of writing SQL in Rails

If it is needed to write SQL strings in Rails, it is good to the following.

  • Table and column names are snake_case.
  • SQL reserved words are UPPER CASE.
  • Use HereDocs for long string.

e.g.

sql = <<~SQL
  SELECT COUNT(id), status FROM users
  WHERE created_at >= ?
  GROUP status
SQL

If it is difficult to read when written with ActiveRecord, this policy will make it easier to read.

Leave id column even if another unique key exists

If there is no particular reason for deleting the id column, it is better to leave it as it is for administrative purposes to easier debugging and management.

Logical deletion

Physical deletion tends to be preferred over logical deletion.

If data restoration or confirmation is needed after deletion, it should be logged by another way other than logical deletion. In fact, statuses that will need to be restored should be managed as "inactive" rather than "deleted".

Nevertheless, I think logical deletion is not 100% unnecessary, so you can use that when you want.

There is no standard function and gem can be used. The two major gems are paranoia and discard, but paranoia is now deprecated.

Github - discard

It will be managed with deleted_at columns, but I think it is possible to manage it without gem.

MySQL Indexes

The following is a brief summary of what you need to know about Indexes in MySQL.

  • Only one Index can be used per table, so if you want to use multiple indexes, it is better to combine them into one compound key (Multiple-Column Indexes). For the other feature conditions, the other Indexes will be prepared.
  • Index is not applicable for non-forward matching LIKE searches.
  • Use .explain to check index usage or SQL performance.
  • Creating too many indexes can cause performance problems, but you don't have to worry too much about it.

It is difficult to predict the behavior of large data when there is small data. So, you can create that data and verify it but, since Indexes are relatively easy to modify later, if it is not worth spending that much time from the beginning, consider modifying them after performance problems begin to appear.

Foreign Key Constraint

It is preferable to add FOREIGN KEY constraint as much as possible. However, if the intention is to have a flexible system or table, it may be an option not use them.

It can be guaranteed that the related data exists, it will be no longer necessary to nil check, such as model.child&.name.

For example, if you want to add FOREIGN KEY constraint on a table with user_id, code the following in create_table.

create_table ~
  t.references :user, index: true, foreign_key: true
end

MySQL Views, Functions, Triggers and Stored Procedures

Basically, MySQL features (such as Views, Functions, Triggers and Stored Procedures) should not be used. There is no clear rule against using these but, I have seen MySQL Views as a technical debt on some times.

I think the reasons are this.

  • Different layer from Rails, so the required skills are different.
  • Difficult to maintain the code, such as updating, debugging and git managing.

It is better to use scope and merge, etc instead of MySQL Views. And it is better to use callbacks or Model methods instead of other MySQL features.

MySQL Functions may be considered as a special case to improve processing speed during bulk insert/update. I think other features(Views, Triggers and Stored Procedures) are basically unnecessary in Rails.

Multiple DBs in Rails

Replication

Replication configurations may be created for load protection like master-slave(main-replica). This book does not explain it in detail. You can see the following If interested.

Rails Guides - Multiple Databases with Active Record

DB wont be divided by purposes

I've seen management systems as native Windows apps that separate DBs such as master tables, transaction data tables, log tables, etc., but I've never seen a Rails system do so. I think the reason for this is that it is incompatible with Rails (ActiveRecord). Instead, namespace can be used in the model.

Namespacing has explained in the Model chapter, please see that.

General-purpose tables are not often created in Rails

This is probably due to avoid the fat model. General-purpose tables may have multiple role and lead to more complicated implementations such as Polymorphic Association.

This may be an anti-pattern and should not be used basically, but it is difficult to say whether it is absolutely prohibited or not, since there may be cases where it would be cleaner to do so, depending on the requirements.

Rails Guides - Polymorphic Associations

Ridgepole

The gem ridgepole is being used increasingly and I highly recommend it. This is used instead of Migrate.

While Migrate needs up/down diff updates to the current DB structure, Ridgepole has only the final version schema. Since it automatically diff updates (such as add/remove columns) when updating the DB, branch switching work is minimized. So, it is suitable for multiple tasks in parallel. Due to the DB update commands are different, it is necessary to change the commands (such as environment creation, deployment) when introducing it, but the introducing advantage is great.

Github - ridgepole

ER Diagram

I have hardly used it and am not familiar with the competitions or options, but I will introduce a few.

Output from DB

Gem rails-erd seems to be relatively major. A command erd creates a reasonably good looking ER diagram by default. It may be hard if you have a lot of tables.

Github - rails-erd

When considering a new DB structure

A free Web service dbdiagram.io was easy to use. There are likely to be other competitors.

dbdiagram.io

Seed

A command rails db:seed will run db/seed.rb. Its implementation varies with each project.

Preferred Implementation Policies

  • Has each table data files as yaml.
  • Has idempotency (No errors and no duplicate data how many times run).
  • Can insert each data for each environment.
  • Calls defined class/method instead of writing code directly to the file.
    • If there is your implementation policy for Service classes, use it to improve maintainability.

seed_fu is no longer maintained

The major gem for seed data was seed_fu, but it has not been maintained since 2018, so it may be not the best decision to adopt it in a new project.

Since Seed can run with a specific file, it is enough to create a new Seed file for additional data and execute Seed with the file.

Often used methods for bulk insert/update

  • If no need validations and callbacks, or big data, these methods will be used insert_all, upsert_all, or find_or_initialize_by, find_or_create_by.
  • If need validations or callbacks ,or small data, , these methods will be used find_or_initialize_by, find_or_create_by, etc. are often used.

Since INSERT one by one is inevitably slow, it may be possible to validate all data first and then use insert_all. In such a case, it is necessary to consider how the callback should be handled (i.e., whether it should be applied manually afterwards).

This kind of thinking is good for processes other than seed, such as Rake Tasks, job, CSV import.

Rake Tasks

Use Rake Tasks when it is needed to create associated data, update, delete and extract data, etc.

Rails Guide - Custom Rake Tasks

The rake files are placed in lib/tasks. Although it is possible to run with only the task definition, it is recommended that you also write namespace and desc (description of the process) to make it easier to read.

Define a module and call it

  • It can use with def, module and class
  • If the file larger, I recommend to define as a module or class and to be call it.
  • Although there are few cases where def is a global definition even if it is written in namespace.
  • Use your implementation policy for a service class if exists, it is easier to debug and read it.

Separate one-time files

For rake files that will be used once, create a directory named (e.g. once_scripts/) so that when anyone look at them later, it sould be easy to see they are no longer needed.

You can implement it as a rails runnner, although I don't use it.

NoSQLs

MongoDB / Elasticsearch / Redis

Consider introducing them to mass data management and load balancing, mainly in B2C system. If only full-text search is required, MySQL's N-gram (FULLTEXT INDEX) may be enough.

I have not used NoSQL much, so I am not sure about the advantages and disadvantages of Redis and MongoDB, except that ElasticSearch is good for searching.

Note that NoSQL is not a replacement for RDB, as its role is different from RDB. If for the static file, consider using CDN services.