Vault Migrations

This article is intended for customers considering a Vault migration project. It provides suggestions and best practices for planning, developing, and executing a Vault migration.

Migration Types

A Vault migration refers to the loading of large volumes of data or documents into Vault. There are two common migration use cases:

Legacy Migration

When replacing a legacy system with Vault, it is common to migrate data from the legacy system into Vault. This type of migration usually occurs during the Vault implementation project for a new Vault, but can also happen when implementing a new application in an existing Vault.

Incremental Migration

Any business event which requires adding new data to Vault may require incremental migration. For example, a phased rollout, system consolidation, or a company acquisition. Incremental migrations occur independent of an application implementation and can affect a live production Vault application.

Approach

At a high level, data migration into Vault generally requires:

  1. Extracting data from legacy system
  2. Transforming data into target format
  3. Cleansing data quality issues
  4. Loading data into Vaultg
  5. Verifying created data

The complexity, effort, and timing of these steps varies based on the data volume, data complexity, system availability requirements, and tools that are used. Choosing the correct approach is the key to a successful migration project.

It is critical that you select an approach that is appropriate for the size and complexity of your migration project. Migrations can be performed by customers or with assistance from Veeva Migration Services and Certified Migration Partners who use validated tools and have experience performing Vault migrations. It is recommended that you speak with your Veeva account team to understand these options prior to starting your project.

Planning

The migration plan should consider dependencies between the Vault configuration and data preparation, as well as provide adequate time for testing and validation. It is important that you conduct performance testing to properly estimate the time it will take to complete migration activities.

Before carrying out a migration, it is necessary to inform Veeva at least three business days in advance or at least one week for large migrations by completing the Vault Migration Planning Form. This notifies Veeva technical operations and support teams of your migration project and allows them to prepare. Fill out this form for any migration that includes over 10,000 documents, 1,000 large (greater than 1 GB) documents (like videos), 500,000 object records, or any migration where the customer or services has concerns. Complete the form for each environment you will be migrating into.

Migration Best Practices

This section identifies the common practices that should be considered when migrating documents, objects, or configuration into Vault.

Extracting Data

Source data for a migration can come from legacy applications, file shares, spreadsheets, or even an existing Vault. The details of extracting data from its source format will depend on the system itself. Customers who are migrating from a complex source application often choose to work with a Certified Migration Partner who has experience extracting data from that application.

Batch and Delta Runs

A key consideration for data extraction is minimizing downtime during the cutover from the legacy application to Vault. Often the cutover is done over a weekend. To support this, it is recommended to migrate the majority of data or documents in batches beforehand while the legacy system is still running and then only do a delta migration, extracting and loading only the data that has changed since the batch run, on the cutover weekend once you have turned the legacy system off. If the target Vault is already live, you can use user access control to hide the batch documents until the cutover.

Data Transformation and Cleansing

Data extracted from the legacy system needs to be transformed before being migrated into Vault. Vault APIs and Vault Loader accept data in comma-separated values (CSV) format. During this process it’s necessary to map data between the legacy system and Vault. Review these Data Transformation Considerations before transforming your data.

Transforming Data References

When populating document or object fields which reference object records or picklists, first ensure the reference data exists in Vault. This reference data can be linked using either object lookup fields or picklist fields. This eliminates the need for system-generated IDs for related records.

Transforming Document Metadata

Mapping Document Metadata

To understand what document metadata values need to be populated during a migration, review the structure of the Vault Data Model. This can be achieved by running the Vault Configuration Report or via the Document Metadata API.

Versioned Documents

Vault automatically assigns major and minor document version numbers. The major version starts at one and then increments each time a new steady state is reached. At that time the minor version resets to zero and then increments with each minor change. Some legacy systems allow users to manually assign their own version numbers. Other legacy systems start version numbers at zero instead of one. As a result, the version number from the legacy system may not match those for documents created in Vault.

State Mapping

Lifecycle names and target states must be considered when mapping states. Source documents in “In” states (In Review, In Approval, etc.), other than In Progress, should not be migrated into Vault. Vault will not apply workflows to migrated documents.

Legacy Signature Pages

Legacy Signature Pages must be in PDF format to be migrated into Vault.

Legacy Document Audit Trails

If audit trail data is required for a migration, this can be done in a variety of ways. It’s recommended to convert the audit data into PDF format, link it to each document, and migrate it as an archived document.

Loading Data and Documents into Vault

Developing Migration Tools or Scripts

We recommend customers either use Vault Loader or a Certified Migration Partners to load data into Vault. These tools have been tested and certified as best practice.

However, if you determine that you will develop your own migration tool using the Vault API you should consider the following:

Use Loader API or Command Line

The Vault Loader API endpoints or the Loader command line allow you to automate migration tasks. The Loader service handles processing, batching, error reporting, and is developed and tested by Veeva. Utilizing the Vault Loader API endpoints or the Loader command line can greatly reduce the migration time.

Use Bulk APIs

Migration should be performed using Bulk APIs for data loading and data verification. Bulk APIs allow you to create a large number of records or documents with a single API call. These APIs are designed for higher data throughput and will minimize the number of calls required. Refer to the table below to see which data types have Bulk APIs.

Set Client ID

In any migrations that use the Vault REST API, it’s recommended to set the Client ID. If any errors occur during the migration, Veeva will be better able to assist in troubleshooting.

Handle API Rate Limits

When migrating data via the Vault REST API, it’s important to consider API rate limits. If API rate limits are exceeded, integrations will be blocked from using the API. To mitigate exceeding limits, bulk versions of APIs should be used whenever possible. Migration programs should be written in such a way so that the limits are checked for each API call. If the burst or daily limit are within a 10% threshold of breaching, this is handled by either waiting until limits are available or stopping the migration process.

Migration Service Account

Consider creating a user specifically for performing migration activities so it’s clear the data creation and any related activities were done as part of the migration. Any record of a document that is created will clearly show that it was done as part of a migration.

Migrating into a Live Vault

Consider the impact on existing users when migrating data into a live Vault.

Scheduling

Migrations can often be a computing-intensive process. For large or complicated migrations, you should schedule migration activities during periods of low user activity such as evenings or weekends.

Configuration Mode

When enabled, Configuration Mode prevents non-Admin users from logging into Vault. Use Configuration Mode if you need to prevent end-users from accessing Vault during a migration.

User Access Control

You can configure user access control to hide migrated data from existing users until the cutover is complete.

Loading Documents

Migrating documents into Vault can be done using the Create Multiple Documents endpoint. An alternative is to use the Vault Loader Command Line Interface (CLI) or API by following the tutorial for Creating & Downloading Documents in Bulk.

Preload Documents to Staging

When loading documents into Vault, first upload the files to the Vault file staging server. This includes the primary source files and document artifacts such as versions, renditions and attachments. This should be done far in advance, as the upload can take time. Vault Loader or one of the bulk APIs carry out the file processing to create documents in Vault.

The same files from sandbox test runs can be reused for subsequent production migrations if the files haven’t changed. To do this, re-link the file staging area from one Vault to another by contacting Support. Further details on re-linking can be found in Scalable FTP.

Migration Mode Header

Document Migration Mode is a Vault setting which loosens some of the Vault constraints that are typically enforced to make the migration of data into Vault run more smoothly. Use the Create Multiple Documents or Load Data Objects endpoints to enable this setting using the API.

To use this setting, the migration user must have the Vault Owner Actions : Document Migration permission in their security profile’s permission set.

Disable Custom functionality

You should disable custom functionality (such as entry actions, custom Java SDK, or jobs), if required. Ensure reference values, such as Lists of Values (LOVS), exist and are active if referenced in migration data.

Document Ingestion Delay

It can take time for documents to appear in Vault searches, renditions, or thumbnails once they have been migrated in. For large migrations, document indexing can take several hours. Account for ingestion delay when verifying the existence of migrated documents in Vault.

Suppress Rendition Generation

It is common to suppress document rendition generation or provide your own renditions for Vault migrations. If you choose not to suppress renditions, it will take a significant amount of time for Vault to process large quantities of renditions. See the Rendition Status page to monitor the progress of rendition jobs.

Binder Structures (i.e. Binders)

Bulk APIs don’t exist for migrating binders and folders into Vault, therefore, allocate sufficient time for this to take place. Consider whether the existing structures are still needed after migrating the data into Vault.

Vault Notifications

After migrating documents, jobs run that provide notifications to users via email, such as periodic review or expiration. Users for each environment should be forewarned that this may occur. Some users may receive a large number of emails.

Loading Objects

Migrate objects into Vault using the Create Object Records endpoint. An alternative is to use the Vault Loader Command Line Interface (CLI) or API by following the tutorial for Loading Object Records.

Record Migration Mode

Record Migration Mode allows the migration of object records in non-initial states within lifecycles. Use the Create Object Records or Load Data Objects endpoints to enable this setting using the API.

To use this setting, the migration user must have the Vault Owner Actions : Record Migration permission in their security profile’s permission set.

Disable Record Triggers and Actions

Record triggers execute custom Vault Java SDK code whenever a data operation on an object occurs. If custom functionality isn’t needed during the migration, disable the record triggers to prevent them from running. These can be re-enabled once the migration is complete.

Testing Migrations

Create Development/Sandbox Vault

Administer a sandbox Vault from the production (or validation) Vault and perform any custom configurations. This is typically done in conjunction with an implementation. At this stage, you can determine the structure of the environment into which the data will be migrated.

Reference data, such as picklists and Vault objects are included with the sandbox, but you will need to load other reference data that your migration depends on. Use Test Data packages to create packages of reference data.

Dry Run Migration

Perform a dry run migration to test the migration logic, data, and approximate timings. It’s not necessary to dry run full data volumes. Logic and timings can be validated using smaller datasets. If the migration fails, correct the issues in the development environment before migrating to the test environment.

Data Verification

Once data has been migrated into Vault, verify the data is as expected. This involves a number of different checks, such as:

Data Transformation Considerations

Several complications can occur when populating Vault metadata. Consider the following best practices to transform data before a migration.

CSV Format

CSV files used to create or update documents using Vault Loader must use UTF-8 encoding and conform to RFC4180.

Date Formatting

Dates migrated into Vault must use the format YYYY-MM-DD.

Date/Time Formatting

Date/time conversion must use the Coordinated Universal Time (UTC) format YYYY-MM-DDTHH:MM:SS.sssZ, for example 2019-07-04T17:00:00.000Z. Hence it must end with the 000Z UTC expression, although the zeros can be any number. Ensure that date/time fields map to the correct day. This may be different depending on the time zone.

Case

If Vault metadata is case-sensitive, convert it to match the expected format.

Special Characters

Metadata must not contain special characters, for example, tabs and smart quotes. These special characters can be lost when migrating data into Vault.

Character Encodings

Saving Excel™ files in CSV format for use with Vault Loader can corrupt the file in an undetectable manner. If the file becomes corrupt, your load will fail. Failure logs contain a record of each row that has failed and are accessible by email or Vault notification. Correct the CSV files to continue loading.

Language

If the data being migrated is multilingual, ensure your Vault is configured to support different languages.

Multi-value Field Comma Separator

When mapping multi-value fields, values with commas can be entered through quoting and escaping. For example, “veeva,,vault“ is equivalent to “veeva,vault“.

Windows™/MacOS™ Formatting

Data formatting can differ per environment. For instance, a line separator behaves differently when being from Windows™ or a MacOS™.

Boolean Fields

Format Yes/No fields as true or false when migrating using the API. This doesn’t apply to Vault Loader, as it handles boolean values regardless of case.

Trailing Spaces

Remove any trailing spaces from metadata. These are commonly found after commas.

Leading Zeros

Migrate numbers in String fields as String values to preserve leading zeros and prevent their conversion to integers.

Unique Identifiers

On documents or object records where Name is not unique or is system-managed, set the External ID (external_id__v or external_id__c) to relate it to the original ID used in the legacy system. Additionally, this field helps distinguish between records in success and failure logs.

Maximum Field Length

Values in Long Text fields must not exceed the maximum length configured in Vault. Vault Loader does not truncate these values.

References to Users and Persons

Documents and objects can reference users (user__sys) and persons (person__sys) records. These records must be active in order to be referenced. If referencing people who have left the company or had a name change, reference a person record as it does not have to be linked to a Vault user account. User name and person name are not unique, therefore, external IDs must be referenced for these objects.

Object Lookups

Many object records have relationships with other records. For example, the Study Site object has a field called study_country__v of data type Parent Object which links it to the Study Country object. If you create a new Study Site record using Vault Loader or the API and happen to know the ID for the desired Study Country record,you can populate it. However, these IDs will change based on the Vault environment. Use a lookup table to obtain the Vault record IDs from the name__v or external_id__v fields. An alternative is to use an object lookup field in the form study_country__vr.name__v = 'United States'.