A Customer's Data Journey from Legacy Systems to Documaster

A Customer's Data Journey from Legacy Systems to Documaster

Dimitar Ouzounov
Jun 5, 2019 2:51:00 PM

Documasters CTO har i denna artikel beskrivit hur vi hjälper våra kunder att samla data på ett ställe oavsett var det skapades.

Documaster was born more than five years ago when we were working on a project to help pilot customers digitize their paper archives into a modern standards-compliant records management system. Our product has evolved significantly since then, and today it serves as a single repository for the most valuable records in the organization (documents and related metadata), regardless of where they were created, and whether they existed originally on paper, or were born in the digital world.

The value of using Documaster increases exponentially with the data volume and each new data source added to the system. Having that in mind, we established several mechanisms to get hold of records created outside of Documaster in various business systems and office productivity suites, and make the records available in Documaster, thereby improving information governance and facilitating knowledge reuse across the organization. One such mechanism is migrating complete software systems (of course, focusing on the valuable records in those systems) into Documaster. In this post I will describe why our customers need several of their systems to be migrated into Documaster, how a typical migration is performed, and what challenges we face during the process.

migrations overview

So why migrate records into Documaster? The most trivial (but not the most common) use case we have encountered is when a customer is switching from an existing records management system to Documaster and they want to continue working with their existing data as if there was no switch to a new product. A more common use case is when the customer has several (typically) legacy systems, which contain important records, and there is a requirement to store these records in a single place to facilitate their retrieval. Another use case is when a customer needs to store records from one or more systems in Documaster for compliance reasons. We usually see a combination of these use cases and in all of them Documaster facilitates the exporting, retrieval, and long-term preservation of the migrated records.

Every migration project starts by us and the customer signing a data processor agreement which details how we (the data processor) handle the data provided by the customer (the data owner). Then the customer fills out a standard questionnaire, which gives us information about the data we will be working with. It is quite important for us to get a feeling of what to expect and understand the context in which the data was created. Once we have reviewed the submitted questionnaire, the customer uploads a package containing the database dump and documents of the system to be migrated, to a secure location. A Documaster migration engineer verifies the uploaded package before the work on the migration commences to ensure the data is not corrupt or invalid.

The actual migration process begins with the migration engineer sending the submitted documents to an in-house developed service that will convert them to PDF/A (a format suitable for long-term preservation). Conversion can take quite some time depending on the total document size, so once it begins the migration engineer will analyze the source database and start mapping it to the Documaster data model by writing SQL queries (some of which can be fairly complex). The good thing is that Documaster supports a standardized data model defined in the Noark 5 standard for records management and electronic archiving, so we always map to the same target. Moreover, our data model does not include process state information and this makes it easier to migrate to Documaster than to migrate to a system that supports processes (workflows) and likely requires process state data to be migrated too. The not-so-great thing is that the source database can be anything. However, if we have seen a database similar to the one at hand, we will not write the mapping SQL queries from scratch, but will reuse much of what we have used previously. This is facilitated by a tool we developed, which helps us find the similarities and differences between databases.

Once the source database has been mapped to Documaster and the documents have been converted, the migration engineer will run a test migration (using an in-house developed tool), essentially copying the source metadata and documents to a fresh Documaster instance. To ensure the high quality of the migrated content, we validate the migration in several ways. First, the Documaster database is validated after the migration with a set of scripts we developed. Second, the migrated content is exported from Documaster, and the resulting package is validated with our open‑source Noark 5 validator. Finally, the migrated content undergoes manual verification. Once these steps are completed, we ask the customer for feedback. The migration may be repeated several times based on the feedback to fully satisfy the customer’s requirements. When that happens, the migration engineer will move the migrated data to the production Documaster instance of the customer.

migrations process



The migration process may appear straightforward and quick but in practice we face many challenges we have to overcome every single day. To begin with, the customer may send us a corrupt or incorrectly generated database dump, the database may use an exotic encoding, and documents may be encrypted or stored as separate chunks in the database instead of as files, which makes them a bit difficult to work with. Low data quality is another serious problem we have observed in many of the systems we have migrated so far. Unfortunately, a large number of the systems we work with do not appear to implement sufficient model constraint checks (which is what we do in Documaster). As a result, we often see fundamental data quality errors such as duplicate identifiers, mandatory fields containing null values, and obviously incorrect dates. To make things worse, no two migrations are the same – even if two databases appear similar, there are always subtle differences in their structure and content related to how the original systems were used. It can also be challenging to migrate systems that do not have a standardized data model (systems not based on a standard such as Noark), as this often requires thorough and time-consuming research.

Despite all challenges, we gain new experience with each new migration, and are confident that over time we are getting better and faster at what we do. This confidence allows us to take on migrations at a relatively low, fixed price, and complete migrations with high quality and in a reasonable amount of time.

Several years ago it was hard to imagine that migrations would play such a key role in what we do as a company. But we took a chance and established a dedicated team of engineers who specialize in this highly-skilled job, which requires expertise ranging from the ability to research unknown data models in relational databases, through understanding and correcting data quality issues, to writing complex SQL queries. Our migration engineers breathe new life into often forgotten or difficult-to-reach data, thus preserving it and making it available to more people, and at the same time helping organizations cut costs for legacy software systems. We believe this is an extremely important job that creates a lot of value for our customers today, and which will also in the future be one of the cornerstones of the ever-expanding digital world we live in.

Vi hoppas att artikeln var intressant och nyttig. Om du vill läsa mer från vår CTO så har han också beskrivit den teknologi vi använder i Documaster.

A Great Offer, Just a Click Away

Lorem ipsum dolor sit amet, consectetur adipiscing elit

No Comments Yet

Let us know what you think