This chapter explains about DB and its peripheral technologies related to Rails.
RDB(SQL)
MySQL had most often used, second had PostgreSQL in Rails in my experience. Rails wraps most of the SQL, but it's never ok not to know SQL.
It is better to use SQL such as INSERT
, UPDATE
, DELETE
, SELECT
~ FROM
, INNER JOIN
, LEFT JOIN
, WHERE
, GROUP
, ORDER
, LIMIT
.
It is better to know SQL such as AS
, HAVING
, CREATE TABLE
, ALTER
, INDEX
(PK
, UNIQUE
), FOREIGN KEY
, VIEW
, FUNCTION
, TRIGGER
, AUTO_INCREMENT
, USER
, ROLE
, etc.
RDBs other than MySQL
Oracle, SQL Server, PostgreSQL, SQLite, etc.
I've never seen Oracle and SQL Server used in Rails, but this is probably largely due to the license fee.
SQLite is basically does not seem to be suitable for managing large data. It is often used to manage small data or smartphone applications.
I have seen some systems PostgreSQL is used.
Differences between MySQL and PostgreSQL in Rails
- Index behavior is different.
- PostgreSQL can use
unique_by
option ofinsert_all
andupsert_all
. - PostgreSQL can not use
after
optionadd_column
. New columns will be added to the end of the table.
Others seem to include,
- PostgreSQL is better suited for subqueries and large scale development (seems to have more features).
- MySQL is relatively easier to set up and manage, and has more information.
However, I haven't noticed a big difference in Rails. About subqueries, it is incompatible with ActiveRecord in the first place, so I don't implement with subqueries very often.
Often used column naming conventions in Rails
- Abbreviations are not often used such as
_flg
,_cd
,_dt
, etc. and avoid words that make sentences longer such asis_
,can_
.- Flag columns are used with the past participle of the verb. And omit
is_
. - e.g. Use
deleted
instead ofdelete_flg
. Also useddeleted_at
. In that case, usedeleted_at.present?
as a flag. - Use
~able
instead ofcan_
. - However, In some cases,
can_
andhas_
, etc. may be easier to understand if used.
- Flag columns are used with the past participle of the verb. And omit
- Use
_at
for datetime and_on
for date.- e.g. A column that manages the expiration time is often named
expired_at
. Also namedexpires_at
if it is mainly has a future datetime.
- e.g. A column that manages the expiration time is often named
- The associated table name can be
{singular of table name}_id
to automatically used the column by Model association definition.
Habits of writing SQL in Rails
If it is needed to write SQL strings in Rails, it is good to the following.
- Table and column names are snake_case.
- SQL reserved words are UPPER CASE.
- Use HereDocs for long string.
e.g.
sql = <<~SQL
SELECT COUNT(id), status FROM users
WHERE created_at >= ?
GROUP status
SQL
If it is difficult to read when written with ActiveRecord, this policy will make it easier to read.
Leave id
column even if another unique key exists
If there is no particular reason for deleting the id
column, it is better to leave it as it is for administrative purposes to easier debugging and management.
Logical deletion
Physical deletion tends to be preferred over logical deletion.
If data restoration or confirmation is needed after deletion, it should be logged by another way other than logical deletion. In fact, statuses that will need to be restored should be managed as "inactive" rather than "deleted".
Nevertheless, I think logical deletion is not 100% unnecessary, so you can use that when you want.
There is no standard function and gem can be used. The two major gems are paranoia
and discard
, but paranoia
is now deprecated.
It will be managed with deleted_at
columns, but I think it is possible to manage it without gem.
MySQL Indexes
The following is a brief summary of what you need to know about Indexes in MySQL.
- Only one Index can be used per table, so if you want to use multiple indexes, it is better to combine them into one compound key (Multiple-Column Indexes). For the other feature conditions, the other Indexes will be prepared.
- Index is not applicable for non-forward matching LIKE searches.
- Use
.explain
to check index usage or SQL performance. - Creating too many indexes can cause performance problems, but you don't have to worry too much about it.
It is difficult to predict the behavior of large data when there is small data. So, you can create that data and verify it but, since Indexes are relatively easy to modify later, if it is not worth spending that much time from the beginning, consider modifying them after performance problems begin to appear.
Foreign Key Constraint
It is preferable to add FOREIGN KEY constraint
as much as possible. However, if the intention is to have a flexible system or table, it may be an option not use them.
It can be guaranteed that the related data exists, it will be no longer necessary to nil check, such as model.child&.name
.
For example, if you want to add FOREIGN KEY constraint
on a table with user_id
, code the following in create_table
.
create_table ~
t.references :user, index: true, foreign_key: true
end
MySQL Views, Functions, Triggers and Stored Procedures
Basically, MySQL features (such as Views, Functions, Triggers and Stored Procedures) should not be used. There is no clear rule against using these but, I have seen MySQL Views as a technical debt on some times.
I think the reasons are this.
- Different layer from Rails, so the required skills are different.
- Difficult to maintain the code, such as updating, debugging and git managing.
It is better to use scope
and merge
, etc instead of MySQL Views. And it is better to use callbacks or Model methods instead of other MySQL features.
MySQL Functions may be considered as a special case to improve processing speed during bulk insert/update. I think other features(Views, Triggers and Stored Procedures) are basically unnecessary in Rails.
Multiple DBs in Rails
Replication
Replication configurations may be created for load protection like master-slave(main-replica). This book does not explain it in detail. You can see the following If interested.
Rails Guides - Multiple Databases with Active Record
DB wont be divided by purposes
I've seen management systems as native Windows apps that separate DBs such as master tables, transaction data tables, log tables, etc., but I've never seen a Rails system do so. I think the reason for this is that it is incompatible with Rails (ActiveRecord). Instead, namespace can be used in the model.
Namespacing has explained in the Model chapter, please see that.
General-purpose tables are not often created in Rails
This is probably due to avoid the fat model. General-purpose tables may have multiple role and lead to more complicated implementations such as Polymorphic Association.
This may be an anti-pattern and should not be used basically, but it is difficult to say whether it is absolutely prohibited or not, since there may be cases where it would be cleaner to do so, depending on the requirements.
Rails Guides - Polymorphic Associations
Ridgepole
The gem ridgepole
is being used increasingly and I highly recommend it. This is used instead of Migrate.
While Migrate needs up/down diff updates to the current DB structure, Ridgepole has only the final version schema. Since it automatically diff updates (such as add/remove columns) when updating the DB, branch switching work is minimized. So, it is suitable for multiple tasks in parallel. Due to the DB update commands are different, it is necessary to change the commands (such as environment creation, deployment) when introducing it, but the introducing advantage is great.
ER Diagram
I have hardly used it and am not familiar with the competitions or options, but I will introduce a few.
Output from DB
Gem rails-erd
seems to be relatively major. A command erd
creates a reasonably good looking ER diagram by default. It may be hard if you have a lot of tables.
When considering a new DB structure
A free Web service dbdiagram.io was easy to use. There are likely to be other competitors.
Seed
A command rails db:seed
will run db/seed.rb
. Its implementation varies with each project.
Preferred Implementation Policies
- Has each table data files as yaml.
- Has idempotency (No errors and no duplicate data how many times run).
- Can insert each data for each environment.
- Calls defined class/method instead of writing code directly to the file.
- If there is your implementation policy for Service classes, use it to improve maintainability.
seed_fu
is no longer maintained
The major gem for seed data was seed_fu
, but it has not been maintained since 2018, so it may be not the best decision to adopt it in a new project.
Since Seed can run with a specific file, it is enough to create a new Seed file for additional data and execute Seed with the file.
Often used methods for bulk insert/update
- If no need validations and callbacks, or big data, these methods will be used
insert_all
,upsert_all
, orfind_or_initialize_by
,find_or_create_by
. - If need validations or callbacks ,or small data, , these methods will be used
find_or_initialize_by
,find_or_create_by
, etc. are often used.
Since INSERT
one by one is inevitably slow, it may be possible to validate all data first and then use insert_all
. In such a case, it is necessary to consider how the callback should be handled (i.e., whether it should be applied manually afterwards).
This kind of thinking is good for processes other than seed
, such as Rake Tasks, job, CSV import.
Rake Tasks
Use Rake Tasks when it is needed to create associated data, update, delete and extract data, etc.
Rails Guide - Custom Rake Tasks
The rake files are placed in lib/tasks
. Although it is possible to run with only the task
definition, it is recommended that you also write namespace
and desc
(description of the process) to make it easier to read.
Define a module and call it
- It can use with
def
,module
andclass
- If the file larger, I recommend to define as a
module
orclass
and to be call it. - Although there are few cases where
def
is a global definition even if it is written innamespace
. - Use your implementation policy for a service class if exists, it is easier to debug and read it.
Separate one-time files
For rake files that will be used once, create a directory named (e.g. once_scripts/
) so that when anyone look at them later, it sould be easy to see they are no longer needed.
You can implement it as a rails runnner
, although I don't use it.
NoSQLs
MongoDB / Elasticsearch / Redis
Consider introducing them to mass data management and load balancing, mainly in B2C system. If only full-text search is required, MySQL's N-gram (FULLTEXT INDEX) may be enough.
I have not used NoSQL much, so I am not sure about the advantages and disadvantages of Redis and MongoDB, except that ElasticSearch is good for searching.
Note that NoSQL is not a replacement for RDB, as its role is different from RDB. If for the static file, consider using CDN services.