Database Migrations with Flyway
As a software engineer at Capital One, I develop prototypes for our commercial bank relationship managers and analysts. My team and I recently built a web application, specifically a data management tool, to provide an interface to create and maintain data sets. The team included two software engineers, four data analysts and scientists, and a product manager. We worked cross-collaboratively throughout the first five months of prototyping to define a database schema that would meet our users’ analytical needs.
During this time, my tech lead and I decided to explore schema migration tools to ease our development as we designed our database. The one we settled on for our database was Flyway. Flyway is an open source database migration tool enabling developers to apply version control practices to their databases. While we looked at several other schema migration tools, this one seemed best suited for our particular use case.
Why did we need a schema migration tool?
Benefits of Schema Migrations and Toolings
As Agile methodologies progressed and became widely adopted in the early 2000s, the need for schema migration tools became greater. Techniques to allow a database design to develop as an application evolves allowed engineers to iterate on software more efficiently.
During this time, database migration tools rose in popularity. Schema migrations add versioning capabilities to a database. Migrations are managed incrementally and are reversible. Tools such as Flyway can prevent database schema mismatch when working with multiple environments, such as dev, test, and prod, or when switching branches. They allow developers to recreate a database from scratch, which is valuable when creating a new environment. Migration tools add assurance that an application will run with the current state of the database. This can prevent a developer from running into a column not found error, for example.
Overview of Flyway
Changes made to the database by Flyway are known as Migrations. In Flyway, Migrations have two types; they can either be versioned or repeatable, versioned being the most common.
Versioned migrations have a version, description, and a checksum and are applied in order exactly once. Repeatable migrations have a description and a checksum and are reapplied every time their checksum changes. When there’s a need to change the schema of the database, a developer adds a SQL script, usually in the db/migrations directory.
After pointing Flyway to a database, it begins scanning the filesystem of the application for migrations. The very first time it runs with the first migration, it creates a table called schema_version with several colums including version, description, script, and checksum. This table is used to keep track of the state of the database. The migrations are sorted based on their version number and executed in order.
After each migration gets applied, the schema_version table gets updated accordingly. Each time Flyway scans the file system of the application for migration, it queries the schema_version table to see what version the database is on and can upgrade the database accordingly.
Migrations that are available, but not applied, are called pending migrations.
Advantages for Our Use Case
The biggest advantage of using a migration tool is that it ensures software releases are always delivered with the matching state of the database. This was particularly important for our product since we were iterating rapidly and needed the ability to deploy features quickly and confidently in order to gather user feedback.
Integrating Flyway into an existing project is simple, as there is little configuration required for setup. The minimal steps in the process include pointing the tool to a database by specifying the url, username, and password in the config file conf/flyway.conf.
Flyway’s strict migration rules enforce discipline and hence best practices. This can be analogous to disabling force push in git. The strict rules Flyway enforces aligns with the practice of keeping SQL files immutable after deployment to an environment. As our team strives to follow best software engineering practices, Flyway was opportune for our product.
As someone with little experience with Database Migration Tools, Flyway was easy to learn and simple to use. All schema migrations are captured in increments. The API exposes six basic methods — migrate, clean, info, validate, baseline, and repair. (You can learn more about each of these commands here). These methods enabled our team to integrate Flyway into our CircleCI deployment commands.
Lastly, Flyway is both flexible and powerful — it has the ability to modify column names along with transferring data from one column to another in just a few simple commands.
Disadvantages for Our Use Case
As mentioned earlier, Flyway is strict when it comes to migration file changes. It’s not easy to change a migration after it’s been checked in. Changing a file’s content or name can cause a migration failure on every machine with a previous version of the file.
Even if a simple change needs to be made — such as adjusting a field name — a new migration script is usually required. As my tech lead and I were iterating rapidly, we found ourselves adding Flyway Scripts on a weekly basis. This resulted in a surplus of files in our migrations folder. During our first five months of prototyping, our application was deployed solely to a dev environment and there was almost no need to preserve the miniscule schema changes incrementally within our application.
It is worth mentioning it is possible to edit a previously executed migration, but only advanced users should do so as it requires updating a row in the schema_version table to adjust the checksum value.
When team members are working in parallel on different branches, version numbers can clash. There are processes to solve this issue, such as using a timestamp for the version number instead of an integer and setting the option outOfOfOrder=true.
Reversing migrations, such as dropping a column that was added in a previous migration, can be tricky, and often require the developer to create a backup of the data.
End Results for Our Application
In hindsight, my team could have better leveraged Flyway if we had integrated it later on when the application and the database design had matured. To have enabled rapid iteration within the first phase of prototyping, I think it would have been adequate for our two-person engineering team to have a single SQL file defining the initial schema, a mock data file, and a script to load our tables with the mock data set.
With this methodology, we could have made changes to the schema by modifying both the initial schema and mock data file within a git branch. After each merge to master or a deploy to an environment, through steps initiated by CircleCI, the database would dump, teardown, recreate and reload using the schema and mock data set. I think Flyway could be best utilized in an application where multiple environments exists and decisions around database changes are less volatile.
Flyway is a valuable tool and many of the advantages outweighed the disadvantages. During the development of our product, it ensured that our application was always running with the matching state of the database. When pulling changes from our master branch, the risk of running into database mismatch errors was almost none. Our migrations folder also served as record of history; our team was able to track our database design changes over time.
As my tech lead and I worked cross collaboratively with our team’s data scientists, providing transparency on the state of the database was key. With the help of CircleCi, the deployment of each migration resulted in little to no downtime on our part.
Flyway was my first experience in using a migration tool and its proven to be effective in managing the state of the database. It has the potential to be even more robust if integrated into an application at the right phase.
Acknowledgements
Thanks to Bruce Harris for his helpful suggestions during the review process.