This article is intended for customers considering a Vault migration project. It provides suggestions and best practices for planning, developing, and executing a Vault migration.
A Vault migration refers to the loading of large volumes of data or documents into Vault. There are two common migration use cases:
When replacing a legacy system with Vault, it is common to migrate data from the legacy system into Vault. This type of migration usually occurs during the Vault implementation project for a new vault, but can also happen when implementing a new application in an existing vault.
Any business event which requires adding new data to Vault may require incremental migration. For example, a phased rollout, system consolidation, or a company acquisition. Incremental migrations occur independent of an application implementation and can affect a live production Vault application.
At a high level, migrating data into Vault will generally require:
The complexity, effort, and timing of these steps varies based on the data volume, data complexity, system availability requirements, and tools that are used. Choosing the correct approach is the key to a successful migration project.
It is critical that you select an approach that is appropriate for the size and complexity of your migration project. The image below outlines high-level guidance based on the size and complexity of your migration. It is recommended that you speak with your Veeva account team to understand these options prior to starting your project. Migrations can be performed by customers or with assistance from Veeva Technical Services. For large or complicated migrations, Veeva partners with multiple Certified Migration Partners who use validated tools and have experience performing Vault migrations.
The migration plan should consider dependencies between the vault configuration and migration data preparation, as well as provide adequate time for migration testing and validation. It is important that you perform performance testing to properly estimate the time it will take to complete migration activities.
Before carrying out any migration it is necessary to inform Veeva as soon as possible, but at least three3 business in advance or at least one week for significant migrations by completing the Vault Migration Planning Form. This notifies Veeva technical operations and support teams of your migration project and allows them to prepare so that it goes smoothly. We ask that you fill out this form for any migration that includes over 10,000 documents, 1,000 large (greater than 1 GB) documents (like videos), 500,000 object records, or any migration where the customer or services has concerns. It will be necessary to complete the form once for each environment you will be migrating into.
This topic identifies the common practices that should be considered when migrating any type of data into Vault, including documents, objects, or configuration.
Source data for a migration can come from legacy applications, file shares, spreadsheets, or even an existing vault. The details of extracting data from its source format will depend on the system itself. Often customers who are migrating from a complex source application will choose to work with a Certified Migration Partner who has experience extracting data from that application.
A key consideration for data extraction is minimizing downtime during the cutover from the legacy application to Vault. Often the cutover is done over a weekend. To support this it is recommended that you migrate the majority of data or documents in batches beforehand while the legacy system is still running and then only do a delta migration, extracting and loading only the data that has changed since the batch run, on the cutover weekend once you have turned the legacy system off. If the target vault is already live, you can use user access control to hide the batch documents until the cutover.
Data extracted from the source system will need to be transformed so that it can be loaded into Vault. Vault APIs and Vault Loader accept data in comma separated values (CSV) format. During this process it will be necessary to map data between the legacy system and Vault, taking care to review these Data Transformation Considerations.
When populating document or object fields, which reference referenced object records or picklists, it will be first necessary to ensure the reference data exists in the vault. This reference data can then be linked using either object lookup fields or picklist fields. This eliminates the need to know the system generated IDs for related records.
To understand what document metadata values need populating during a migration, it’s useful to understand the structure of the Vault Data Model. This can be achieved by running the Vault Configuration Report or via the Document Metadata API.
Vault uses automatically assigned major and minor document version numbering. The major version starts at 1 and then increments each time a new steady state is reached. At that time the minor version resets to 0 and then increments per minor change. Some legacy systems allow users to manually assign their own version numbers, starting at zero and as a result the version number from the legacy system may not match those for documents created in Vault.
Lifecycle names and tTarget states must be considered when mapping states. Please note that source documents in the workflows, or in an “In” state (In Review, In Approval, etc), other than In Progress, should not be migrated to vault, as all documents that are migrated will not have workflow applied.
When Legacy Signature Pages are to be migrated into Vault they must be loaded as a PDF file.
If audit trail data is required when migrating into Vault this can be done in a variety of ways. The recommended model is to convert the Audit data into PDF format linked to each document and bring it in as an archive document.
It is generally recommended that customers either use Vault Loader or a Certified Migration Partner to load data into Vault as these tools have been tested and certified to use best practices in their development.
However, if you determine that you will develop your own migration tool using the Vault API you should consider the following practices:
Using the Loader API or the Loader command line allows you to automate migration tasks, but take advantage of the Loader service, which handles processing, batching, and error reporting. Also the Loader service is a product service and has been developed with best practices and tested by Veeva. Taking advantage of the Loader API or command line can greatly reduce the development time.
Migration should also be done using Bulk APIs for data loading and data verification. Bulk APIs allow you to create a large number of records or documents with a single API call. These APIs are designed for higher data throughput and will minimize the number of calls required. Refer to the table below to see which data types have Bulk APIs.
In any migrations that use the vault REST API, it’s recommended to set the Client ID, so should any errors occur during the migration Veeva will be better able to assist in determining what may have gone wrong.
When migrating data via the Vault REST API, it’s very important to ensure API rate limits are considered, as if the limits are breached, integrations will be blocked from using the API if the limits aren’t handled correctly. To mitigate the limits being breached bulk versions of APIs should always be used wherever possible. Migration programs should be written in such a way so that the limit’s are checked for each API call, so that if either the burst or daily limit are within a 10% threshold of breaching, this is handled by either waiting until limits are available or cleanly stopping the migration process.
Consider creating a user specifically for performing migration activities, so it’s clear the data creation and any related activities were done as part of the migration. Any record of a document that is created will clearly show that it was done as part of a migration.
When migrating data into a live vault, you need to consider the impact on existing users.
Migration can often be a computing resource intensive process. If you are migrating large amounts or complex data, you should schedule migration activities during periods of low user activity such as evenings or weekends.
Configuration Mode is a Vault setting which when enabled prevents non-Admin users from logging into the vault. Use configuration mode if you need to prevent end-users from accessing the vault during the migration.
If you need to hide migrated data from existing users, you can configure user access control to hide data until the cutover.
Migrating documents into Vault can either be done using the Create Multiple Documents API or using the Vault Loader Command Line Interface or API, by following the Create Documents tutorial in Creating & Downloading Documents in Bulk.
Whether loading documents files into Vault using Create Multiple Documents API or Vault Loader, it’s necessary to first upload the files to the vault File staging server. This will include the primary content files or document artifacts such as versions, renditions and attachments. This should be done as far in advance as possible, as the upload can take time. Once the user calls Vault Loader or one of the Batch APIs Vault carries out the file processing to create documents in Vault.
The same files from sandbox test runs can also be reused for subsequent production migrations if it’s determined the files haven’t changed by re-linking the File staging area from one vault to another by contacting Support. Further details on re-linking can be found in Scalable FTP.
Document Migration Mode is a Vault setting which loosens some of the vault constraints that are typically enforced to make the migration of data into Vault run more smoothly. When using the Create Multiple Documents API, this setting is enabled by passing the Document Migration Mode API Header. With the Vault Loader API’s Load Data Objects endpoint, this setting is enabled by passing the
To use this setting the migration user must have the Vault Owner Actions : Document Migration permission in their security profile’s permission set.
It’s important to ensure that any custom functionality (such as entry actions, custom Java SDK, or jobs) is disabled if required. Also ensure reference values such as Lists of Values (LOVS) exist and are active if referenced in migration data.
It can take time for documents to appear in Vault once they have been migrated in, either in searches, renditions, thumbnails, etc. Therefore, when verifying to see whether documents have been migrated using the Vault UI, if they aren’t it could be due to the ingestion delay.
It is common to suppress document renditions creation during migration into Vault, since the renditions have already been generated by the legacy source system and are migrated directly instead.
Bulk APIs don’t exist for migrating binders and folders into Vault, therefore sufficient time must be allocated to allow for this to take place. Care must also be taken so as not to breach API rate limits. Consideration should be given as to whether the existing structures are still needed after migrating the data into Vault.
Note that after migrating documents, jobs will run that provide notifications to users via email, such as periodic review or expiration. It’s recommended that users for each environment be forewarned that this may occur. When documents are first loaded to vault, some users may receive a large number of emails.
Migrating objects into Vault can either be done using the Create Multiple Object Records API or using the Vault Loader Command Line Interface (CLI) or API, by following the instructions in Loading Object Records.
Record migration mode allows the migration of object records in the non initial state for objects within lifecycles. Using the Create Multiple Object Records API this setting is enabled by passing the Record Migration Mode API Header. With the Vault Loader API’s Load Data Objects endpoint, this setting is enabled by passing the
To use this setting the migration user must have the Vault Owner Actions : Record Migration permission in their security profile’s permission sets.
Record triggers execute custom Vault Java SDK code whenever a data operation on an object occurs. It may be the case that any custom functionality isn’t needed during the migration in which case the record triggers can be disabled to prevent them from running. Once the migration is complete they can be re-enabled.
A sandbox Vault will need to be created from the production (or validation) vault using the Administering Sandbox Vaults tools and any configuration customizations will need to be created. This would typically be done in conjunction with an implementation consultant. At this stage it will be possible to determine the structure of the environment into which the data will be migrated.
Reference data, such as picklists and Vault objects like Product are included with the sandbox, but you will need to load other reference data that your migration is based on. Test Data packages can be used to create packages of reference data.
A dry run should be carried out to test the migration logic, data, and confirm timings. It is not necessary to dry run full data volumes. Logic and timings can be validated using smaller datasets. If the migration fails the issues should be corrected in the development environment before being reapplied to the test environment.
Once data has been migrated into Vault, it is necessary to verify the data to check that what has been migrated is as expected. This will involve a number of different checks, such as:
When populating Vault metadata, several different complications can occur which will need to be handled during data transformation.
Any dates migrated into Vault must use the format
Date/time conversion must use the Coordinated Universal Time (UTC) format
YYYY-MM-DDTHH:MM:SS.sssZ, for example
2019-07-04T17:00:00.000Z. Hence it must end with the
000Z UTC expression, although the zeros can be any number. Care should also be taken when mapping date/time fields to check they map to the correct day, as depending on the time zone this may be different.
Some metadata in Vault is case-sensitive so may need converting to match the expected format.
Many migrations and loads will utilize special characters and is especially common outside of the USA, so care should be taken to ensure they aren’t lost. Metadata must not contain special characters, for example, tabs and smart quotes.
Care should be taken when saving Excel™ to CSV for use in Loader as it will corrupt the file in a manner which can be hard to detect. If the file becomes corrupt it will fail to load and failure logs will both be emailed and appear in user notifications containing a record for each row that has failed. The records will then need to be corrected and rerun as detailed in the About Vault Loader help.
Make sure your vault is configured to accept different languages if the data being migrated in is multilingual. Details of how to configure Vault to handle multiple languages can be found at About Language & Region Settings.
When mapping multi-value fields, values with commas can be entered through quoting and escaping. i.e.
“veeva,,vault“ is equivalent to
Data formatting can differ per environment. For instance, a line separator behaves differently when being loaded from Windows™ or a MacOS™, therefore any loaded data will need checking.
When migrating Yes/No fields via the API they must be formatted as
false, making sure to use lowercase. This doesn’t apply to Vault Loader as it handles true and false regardless of case.
Make sure to remove any trailing spaces, especially after any commas.
When loading numbers from String fields in the source system into String fields in Vault, make sure they are migrated as Strings rather than being converted to Integers to preserve leading zeros.
On migrated documents or object records where Name is not unique, or Name is system-managed, it will be necessary to set the External ID (i.e.
external_id__c) to relate it to the original ID used in the source system. Having this field in loader files also makes it easier to track success in the API logs.
Fields must not be larger than the maximum field size for migrated text fields as Loader does not truncate migrated data.
Please note that documents and objects can reference users (
user__sys) and/or persons (
person__sys). These records must be active in order to be referenced. In the case of referencing people who have left the company or had a name change,
person__sys is recommended as it does not have to be tied to an actual user account. In addition, user name and person name is not unique, so it’s recommended that external IDs be used for these objects.
Many object records have relationships with other records. For example, the Study Site object has a field called
study_country__v of data type Parent Object which links it to the Study Country object as shown in parentheses. If you create a new Study Site record via Loader or the API and you happen to know the ID for the desired sStudy cCountry record you can populate it, however, these IDs will change on a Vault environment by environment basis. This means it will be necessary to have a lookup table so the specific Vault record IDs can be obtained using the
external_id__v fields to perform the lookup. An easier alternative is to use an object lookup field in the form
study_country__vr.name__v = 'United States'.