Datastore Migrator
Migration tool for non-RDBMS datastores. Support for different datastores will be offered via plugin capability.
Status
WORK IN PROGRESS
Goals
To provide a way to:
- declaratively define schema migrations
- define plan which can be per environment
- detect status of plan for a target
- run plan of migrations against a target
Terminology
- Schema: construct in a datastore providing structure for data, e.g. index with mappings in a search engine like ElasticSearch, table with columns and/or column families in HBase, etc.
- Migration: atomic unit of change
- Plan: sequence of migrations
- Status: report identifying what migrations have been applied and what are pending
- Target: specific datastore typically represented by a URL
Motivations
Here are some of the reasons not to build the same kind of thing over and over again for the main different datastores, search engines, message brokers, etc that require some kind of "schema" (used loosely here).
First of all I think the activities for each of these different types of stores is overwhelmingly similar from an infrastructure management perspective.
Secondly separating out the plan from the migrations themselves provides a way to solve the issue of having different needs in different environments. Where environment might be customer installs for single tenant applications or different sets of nodes for staging, profiling, preproduction, and production in multi-tenant configurations.
At the time I started this project I wasn't aware of a tool that provided such a model to help solve the problems I was seeing at work where we had HBase, ElasticSearch, Redis, Kafka, etc. which all needed to have their "schemas" managed upon deployment of new application code (sometimes). This leaves the application not needing to handle the setup of, say, topics on message brokers (e.g. Kafka, RabbitMQ, etc.) and keep their focus on the application logic.
Roadmap
This is a project basically at it's inception so there is a lot to do before we get the first working version shipped. Namely:
Version 0.1 should have the ability to do the following:
- Provide Scala API to declaratively define migrations
- Provide Scala API to declaratively define plans
- Report status of a plan for the given target
- Run plan of migrations against a target
- Implement migration runner for HBase
Version 0.2 should have the following capabilities in addition to 0.1:
- Implement migration runner for ElasticSearch 0.9.x/1.x
- Provide Java-compatible API to define migrations
- Provide Java-compatible API to defined plans
TODO define next version milestones.
Code Structure
I use a less common structure to my code than many Scala projects so here is the code structure outline and explanation:
-
Types: I use a more functional style of coding in Scala and as a
result start coding from closed algebraic data types (usually sum,
product, and recursive types), which define the basic elements of the
domain. These traits (types) and corresponding case classes/objects (value
constructors) can be found in the
Types
trait for namespacing. See below for description of namespace organization. -
"Classes": this does not refer to OO classes but rather typeclass
definitions. I have a trait named
Classes
which contains typeclass definitions used to extend our basic sum, product, and recursive data types. One such type I am using to extend my coreAction[A]
type isGroup[A]
-
"Instances": again this does not refer to "instances" of OO classes,
rather this refers to implementations of typeclasses for specific types.
In the trait named
Instances
you will find a number of implicits that can be mixed in to different contexts later the allows Scala to find the correct instance definition for a specific type of a typeclass based on the scope it is introduced. More specific scopes have higher precedence which means the defaultdsmigrator
package instance definitions can be overridden in applicaation/client code at a higher level if necessary. -
Functions: I have a trait named
Functions
, which along side the interface to our core types and typeclasses provides the public API for thedsmigrator
library/toolkit. -
Namespacing: You will note I am using traits for namespacing primitives
that I then use to mixin to objects used for external package namespacing.
View the code in src/main/scala/dsmigrator/package.scala
to see how the
dsmigrator.core
package is constructed from traits. You might also notice the following inline comments in theTypes
trait that hints at a minor issue with scalac supported my preferredfinal case
style in src/main/scala/dsmigrator/types.scala
FAQ
- So the building blocks of my code are closed, how can we extend them? Great question. Essentially the are only closed on construction, however, using ad-hoc polymorphism via Scala's powerful implicits feature we can extend these types interfaces. The benefit here is twofold. Firstly we have the ability to control how values of our basic types are constructed, which allows us to ensure they are constructed in valid forms only. Secondly we can still extend the effective interface of the types so we can use them in well-defined ways later without coupling the definition of the type with all the possible ways it can be used. This is contrary to mainstream OO techniques using interfaces that need to be "implemented" at time of type definition as opposed to later. Note: not all OO languages are this limited just the most used ones :(
License
This software is licensed under the BSD 3-clause license. See LICENSE file for more information.
Author(s) / Maintainer(s)
So far just me, Susan Potter.
Contributor(s)
Coming soon. Submit a single patch and see your name in flashing lights here. Or something.