Vault Migrations

This article is intended for customers considering a Vault migration project. It provides suggestions and best practices for planning, developing, and executing a Vault migration.

Migration Types

A Vault migration refers to the loading of large volumes of data or documents into Vault. There are two common migration use cases:

Legacy Migration

When replacing a legacy system with Vault, it is common to migrate data from the legacy system into Vault. This type of migration usually occurs during the Vault implementation project for a new vault, but can also happen when implementing a new application in an existing vault.

Incremental Migration

Any business event which requires adding new data to Vault may require incremental migration. For example, a phased rollout, system consolidation, or a company acquisition. Incremental migrations occur independent of an application implementation and can affect a live production Vault application.

Approach

At a high level, migrating data into Vault will generally require:

  1. Extracting data from source system
  2. Transforming data into target format
  3. Cleansing data quality issues
  4. Loading data into Vault
  5. Verifying created data

The complexity, effort, and timing of these steps varies based on the data volume, data complexity, system availability requirements, and tools that are used. Choosing the correct approach is the key to a successful migration project.

It is critical that you select an approach that is appropriate for the size and complexity of your migration project. The image below outlines high-level guidance based on the size and complexity of your migration. It is recommended that you speak with your Veeva account team to understand these options prior to starting your project. Migrations can be performed by customers or with assistance from Veeva Technical Services. For large or complicated migrations, Veeva partners with multiple Certified Migration Partners who use validated tools and have experience performing Vault migrations.

Planning

The migration plan should consider dependencies between the vault configuration and migration data preparation, as well as provide adequate time for migration testing and validation. It is important that you perform performance testing to properly estimate the time it will take to complete migration activities.

Before carrying out any migration it is necessary to inform Veeva as soon as possible, but at least three3 business in advance or at least one week for significant migrations by completing the Vault Migration Planning Form. This notifies Veeva technical operations and support teams of your migration project and allows them to prepare so that it goes smoothly. We ask that you fill out this form for any migration that includes over 10,000 documents, 1,000 large (greater than 1 GB) documents (like videos), 500,000 object records, or any migration where the customer or services has concerns. It will be necessary to complete the form once for each environment you will be migrating into.

Migration Best Practices

This topic identifies the common practices that should be considered when migrating any type of data into Vault, including documents, objects, or configuration.

Extracting Data

Source data for a migration can come from legacy applications, file shares, spreadsheets, or even an existing vault. The details of extracting data from its source format will depend on the system itself. Often customers who are migrating from a complex source application will choose to work with a Certified Migration Partner who has experience extracting data from that application.

Batch and Delta Runs

A key consideration for data extraction is minimizing downtime during the cutover from the legacy application to Vault. Often the cutover is done over a weekend. To support this it is recommended that you migrate the majority of data or documents in batches beforehand while the legacy system is still running and then only do a delta migration, extracting and loading only the data that has changed since the batch run, on the cutover weekend once you have turned the legacy system off. If the target vault is already live, you can use user access control to hide the batch documents until the cutover.

Data Transformation and Cleansing

Data extracted from the source system will need to be transformed so that it can be loaded into Vault. Vault APIs and Vault Loader accept data in comma separated values (CSV) format. During this process it will be necessary to map data between the legacy system and Vault, taking care to review these Data Transformation Considerations.

Transforming Data References

When populating document or object fields, which reference referenced object records or picklists, it will be first necessary to ensure the reference data exists in the vault. This reference data can then be linked using either object lookup fields or picklist fields. This eliminates the need to know the system generated IDs for related records.

Transforming Document Metadata

Mapping Document Metadata

To understand what document metadata values need populating during a migration, it’s useful to understand the structure of the Vault Data Model. This can be achieved by running the Vault Configuration Report or via the Document Metadata API.

Versioned Documents

Vault uses automatically assigned major and minor document version numbering. The major version starts at 1 and then increments each time a new steady state is reached. At that time the minor version resets to 0 and then increments per minor change. Some legacy systems allow users to manually assign their own version numbers, starting at zero and as a result the version number from the legacy system may not match those for documents created in Vault.

State Mapping

Lifecycle names and tTarget states must be considered when mapping states. Please note that source documents in the workflows, or in an “In” state (In Review, In Approval, etc), other than In Progress, should not be migrated to vault, as all documents that are migrated will not have workflow applied.

Legacy Signature Pages

When Legacy Signature Pages are to be migrated into Vault they must be loaded as a PDF file.

Legacy Document Audit Trails

If audit trail data is required when migrating into Vault this can be done in a variety of ways. The recommended model is to convert the Audit data into PDF format linked to each document and bring it in as an archive document.

Loading Data and Documents into Vault

Developing Migration Tools or Scripts

It is generally recommended that customers either use Vault Loader or a Certified Migration Partner to load data into Vault as these tools have been tested and certified to use best practices in their development.

However, if you determine that you will develop your own migration tool using the Vault API you should consider the following practices:

Use Loader API or Command Line

Using the Loader API or the Loader command line allows you to automate migration tasks, but take advantage of the Loader service, which handles processing, batching, and error reporting. Also the Loader service is a product service and has been developed with best practices and tested by Veeva. Taking advantage of the Loader API or command line can greatly reduce the development time.

Use Bulk APIs

Migration should also be done using Bulk APIs for data loading and data verification. Bulk APIs allow you to create a large number of records or documents with a single API call. These APIs are designed for higher data throughput and will minimize the number of calls required. Refer to the table below to see which data types have Bulk APIs.

Set Client ID

In any migrations that use the vault REST API, it’s recommended to set the Client ID, so should any errors occur during the migration Veeva will be better able to assist in determining what may have gone wrong.

Handle API Rate Limits

When migrating data via the Vault REST API, it’s very important to ensure API rate limits are considered, as if the limits are breached, integrations will be blocked from using the API if the limits aren’t handled correctly. To mitigate the limits being breached bulk versions of APIs should always be used wherever possible. Migration programs should be written in such a way so that the limit’s are checked for each API call, so that if either the burst or daily limit are within a 10% threshold of breaching, this is handled by either waiting until limits are available or cleanly stopping the migration process.

Migration Service Account

Consider creating a user specifically for performing migration activities, so it’s clear the data creation and any related activities were done as part of the migration. Any record of a document that is created will clearly show that it was done as part of a migration.

Migrating into a Live Vault

When migrating data into a live vault, you need to consider the impact on existing users.

Scheduling

Migration can often be a computing resource intensive process. If you are migrating large amounts or complex data, you should schedule migration activities during periods of low user activity such as evenings or weekends.

Configuration Mode

Configuration Mode is a Vault setting which when enabled prevents non-Admin users from logging into the vault. Use configuration mode if you need to prevent end-users from accessing the vault during the migration.

User Access Control

If you need to hide migrated data from existing users, you can configure user access control to hide data until the cutover.

Loading Documents

Migrating documents into Vault can either be done using the Create Multiple Documents API or using the Vault Loader Command Line Interface or API, by following the Create Documents tutorial in Creating & Downloading Documents in Bulk.

Preload Documents to Staging

Whether loading documents files into Vault using Create Multiple Documents API or Vault Loader, it’s necessary to first upload the files to the vault File staging server. This will include the primary content files or document artifacts such as versions, renditions and attachments. This should be done as far in advance as possible, as the upload can take time. Once the user calls Vault Loader or one of the Batch APIs Vault carries out the file processing to create documents in Vault.

The same files from sandbox test runs can also be reused for subsequent production migrations if it’s determined the files haven’t changed by re-linking the File staging area from one vault to another by contacting Support. Further details on re-linking can be found in Scalable FTP.

Migration Mode Header

Document Migration Mode is a Vault setting which loosens some of the vault constraints that are typically enforced to make the migration of data into Vault run more smoothly. When using the Create Multiple Documents API, this setting is enabled by passing the Document Migration Mode API Header. With the Vault Loader API’s Load Data Objects endpoint, this setting is enabled by passing the documentmigrationmode parameter.

To use this setting the migration user must have the Vault Owner Actions : Document Migration permission in their security profile’s permission set.

Disable Custom functionality

It’s important to ensure that any custom functionality (such as entry actions, custom Java SDK, or jobs) is disabled if required. Also ensure reference values such as Lists of Values (LOVS) exist and are active if referenced in migration data.

Document Ingestion Delay

It can take time for documents to appear in Vault once they have been migrated in, either in searches, renditions, thumbnails, etc. Therefore, when verifying to see whether documents have been migrated using the Vault UI, if they aren’t it could be due to the ingestion delay.

Suppress Rendition Generation

It is common to suppress document renditions creation during migration into Vault, since the renditions have already been generated by the legacy source system and are migrated directly instead.

Structures (i.e. Binders)

Bulk APIs don’t exist for migrating binders and folders into Vault, therefore sufficient time must be allocated to allow for this to take place. Care must also be taken so as not to breach API rate limits. Consideration should be given as to whether the existing structures are still needed after migrating the data into Vault.

Vault Notifications

Note that after migrating documents, jobs will run that provide notifications to users via email, such as periodic review or expiration. It’s recommended that users for each environment be forewarned that this may occur. When documents are first loaded to vault, some users may receive a large number of emails.

Loading Objects

Migrating objects into Vault can either be done using the Create Multiple Object Records API or using the Vault Loader Command Line Interface (CLI) or API, by following the instructions in Loading Object Records.

Record Migration Mode

Record migration mode allows the migration of object records in the non initial state for objects within lifecycles. Using the Create Multiple Object Records API this setting is enabled by passing the Record Migration Mode API Header. With the Vault Loader API’s Load Data Objects endpoint, this setting is enabled by passing the recordmigrationmode parameter.

To use this setting the migration user must have the Vault Owner Actions : Record Migration permission in their security profile’s permission sets.

Disable Record Triggers and Actions

Record triggers execute custom Vault Java SDK code whenever a data operation on an object occurs. It may be the case that any custom functionality isn’t needed during the migration in which case the record triggers can be disabled to prevent them from running. Once the migration is complete they can be re-enabled.

Testing Migrations

Create Development/Sandbox Vault

A sandbox Vault will need to be created from the production (or validation) vault using the Administering Sandbox Vaults tools and any configuration customizations will need to be created. This would typically be done in conjunction with an implementation consultant. At this stage it will be possible to determine the structure of the environment into which the data will be migrated.

Reference data, such as picklists and Vault objects like Product are included with the sandbox, but you will need to load other reference data that your migration is based on. Test Data packages can be used to create packages of reference data.

Dry Run Migration

A dry run should be carried out to test the migration logic, data, and confirm timings. It is not necessary to dry run full data volumes. Logic and timings can be validated using smaller datasets. If the migration fails the issues should be corrected in the development environment before being reapplied to the test environment.

Data Verification

Once data has been migrated into Vault, it is necessary to verify the data to check that what has been migrated is as expected. This will involve a number of different checks, such as:

Data Transformation Considerations

When populating Vault metadata, several different complications can occur which will need to be handled during data transformation.

CSV Format

Any CSV files used to create or update documents using Vault Loader must use UTF-8 encoding and conform to RFC4180.

Date Formatting

Any dates migrated into Vault must use the format YYYY-MM-DD.

Date/Time Formatting

Date/time conversion must use the Coordinated Universal Time (UTC) format YYYY-MM-DDTHH:MM:SS.sssZ, for example 2019-07-04T17:00:00.000Z. Hence it must end with the 000Z UTC expression, although the zeros can be any number. Care should also be taken when mapping date/time fields to check they map to the correct day, as depending on the time zone this may be different.

Case

Some metadata in Vault is case-sensitive so may need converting to match the expected format.

Special Characters

Many migrations and loads will utilize special characters and is especially common outside of the USA, so care should be taken to ensure they aren’t lost. Metadata must not contain special characters, for example, tabs and smart quotes.

Character Encodings

Care should be taken when saving Excel™ to CSV for use in Loader as it will corrupt the file in a manner which can be hard to detect. If the file becomes corrupt it will fail to load and failure logs will both be emailed and appear in user notifications containing a record for each row that has failed. The records will then need to be corrected and rerun as detailed in the About Vault Loader help.

Language

Make sure your vault is configured to accept different languages if the data being migrated in is multilingual. Details of how to configure Vault to handle multiple languages can be found at About Language & Region Settings.

Multi-value Field Comma Separator

When mapping multi-value fields, values with commas can be entered through quoting and escaping. i.e. “veeva,,vault“ is equivalent to “veeva,vault“.

Windows™/MacOS™ Formatting

Data formatting can differ per environment. For instance, a line separator behaves differently when being loaded from Windows™ or a MacOS™, therefore any loaded data will need checking.

Boolean Fields

When migrating Yes/No fields via the API they must be formatted as true or false, making sure to use lowercase. This doesn’t apply to Vault Loader as it handles true and false regardless of case.

Trailing Spaces

Make sure to remove any trailing spaces, especially after any commas.

Preserve Leading Zeros

When loading numbers from String fields in the source system into String fields in Vault, make sure they are migrated as Strings rather than being converted to Integers to preserve leading zeros.

Unique Identifier

On migrated documents or object records where Name is not unique, or Name is system-managed, it will be necessary to set the External ID (i.e. external_id__v or external_id__c) to relate it to the original ID used in the source system. Having this field in loader files also makes it easier to track success in the API logs.

Maximum Field Size

Fields must not be larger than the maximum field size for migrated text fields as Loader does not truncate migrated data.

References to Users and Persons

Please note that documents and objects can reference users (user__sys) and/or persons (person__sys). These records must be active in order to be referenced. In the case of referencing people who have left the company or had a name change, person__sys is recommended as it does not have to be tied to an actual user account. In addition, user name and person name is not unique, so it’s recommended that external IDs be used for these objects.

Object Lookups

Many object records have relationships with other records. For example, the Study Site object has a field called study_country__v of data type Parent Object which links it to the Study Country object as shown in parentheses. If you create a new Study Site record via Loader or the API and you happen to know the ID for the desired sStudy cCountry record you can populate it, however, these IDs will change on a Vault environment by environment basis. This means it will be necessary to have a lookup table so the specific Vault record IDs can be obtained using the name__v or external_id__v fields to perform the lookup. An easier alternative is to use an object lookup field in the form study_country__vr.name__v = 'United States'.