This article is intended for customers considering a Vault migration project. It provides suggestions and best practices for planning, developing, and executing a Vault migration.
A Vault migration refers to the loading of large volumes of data or documents into Vault. There are two common migration use cases:
When replacing a legacy system with Vault, it is common to migrate data from the legacy system into Vault. This type of migration usually occurs during the Vault implementation project for a new Vault, but can also happen when implementing a new application in an existing Vault.
Any business event which requires adding new data to Vault may require incremental migration. For example, a phased rollout, system consolidation, or a company acquisition. Incremental migrations occur independent of an application implementation and can affect a live production Vault application.
At a high level, data migration into Vault generally requires:
The complexity, effort, and timing of these steps varies based on the data volume, data complexity, system availability requirements, and tools that are used. Choosing the correct approach is the key to a successful migration project.
It is critical that you select an approach that is appropriate for the size and complexity of your migration project. Migrations can be performed by customers or with assistance from Veeva Migration Services and Certified Migration Partners who use validated tools and have experience performing Vault migrations. It is recommended that you speak with your Veeva account team to understand these options prior to starting your project.
The migration plan should consider dependencies between the Vault configuration and data preparation, as well as provide adequate time for testing and validation. It is important that you conduct performance testing to properly estimate the time it will take to complete migration activities.
Before carrying out a migration, it is necessary to inform Veeva at least three business days in advance or at least one week for large migrations by completing the Vault Migration Planning Form. This notifies Veeva technical operations and support teams of your migration project and allows them to prepare. Fill out this form for any migration that includes over 10,000 documents, 1,000 large (greater than 1 GB) documents (like videos), 500,000 object records, or any migration where the customer or services has concerns. Complete the form for each environment you will be migrating into.
This section identifies the common practices that should be considered when migrating documents, objects, or configuration into Vault.
Source data for a migration can come from legacy applications, file shares, spreadsheets, or even an existing Vault. The details of extracting data from its source format will depend on the system itself. Customers who are migrating from a complex source application often choose to work with a Certified Migration Partner who has experience extracting data from that application.
A key consideration for data extraction is minimizing downtime during the cutover from the legacy application to Vault. Often the cutover is done over a weekend. To support this, it is recommended to migrate the majority of data or documents in batches beforehand while the legacy system is still running and then only do a delta migration, extracting and loading only the data that has changed since the batch run, on the cutover weekend once you have turned the legacy system off. If the target Vault is already live, you can use user access control to hide the batch documents until the cutover.
Data extracted from the legacy system needs to be transformed before being migrated into Vault. Vault APIs and Vault Loader accept data in comma-separated values (CSV) format. During this process it’s necessary to map data between the legacy system and Vault. Review these Data Transformation Considerations before transforming your data.
When populating document or object fields which reference object records or picklists, first ensure the reference data exists in Vault. This reference data can be linked using either object lookup fields or picklist fields. This eliminates the need for system-generated IDs for related records.
To understand what document metadata values need to be populated during a migration, review the structure of the Vault Data Model. This can be achieved by running the Vault Configuration Report or via the Document Metadata API.
Vault automatically assigns major and minor document version numbers. The major version starts at one and then increments each time a new steady state is reached. At that time the minor version resets to zero and then increments with each minor change. Some legacy systems allow users to manually assign their own version numbers. Other legacy systems start version numbers at zero instead of one. As a result, the version number from the legacy system may not match those for documents created in Vault.
Lifecycle names and target states must be considered when mapping states. Source documents in “In” states (In Review, In Approval, etc.), other than In Progress, should not be migrated into Vault. Vault will not apply workflows to migrated documents.
Legacy Signature Pages must be in PDF format to be migrated into Vault.
If audit trail data is required for a migration, this can be done in a variety of ways. It’s recommended to convert the audit data into PDF format, link it to each document, and migrate it as an archived document.
We recommend customers either use Vault Loader or a Certified Migration Partners to load data into Vault. These tools have been tested and certified as best practice.
However, if you determine that you will develop your own migration tool using the Vault API you should consider the following:
The Vault Loader API endpoints or the Loader command line allow you to automate migration tasks. The Loader service handles processing, batching, error reporting, and is developed and tested by Veeva. Utilizing the Vault Loader API endpoints or the Loader command line can greatly reduce the migration time.
Migration should be performed using Bulk APIs for data loading and data verification. Bulk APIs allow you to create a large number of records or documents with a single API call. These APIs are designed for higher data throughput and will minimize the number of calls required. Refer to the table below to see which data types have Bulk APIs.
In any migrations that use the Vault REST API, it’s recommended to set the Client ID. If any errors occur during the migration, Veeva will be better able to assist in troubleshooting.
When migrating data via the Vault REST API, it’s important to consider API rate limits. If API rate limits are exceeded, integrations will be blocked from using the API. To mitigate exceeding limits, bulk versions of APIs should be used whenever possible. Migration programs should be written in such a way so that the limits are checked for each API call. If the burst or daily limit are within a 10% threshold of breaching, this is handled by either waiting until limits are available or stopping the migration process.
Consider creating a user specifically for performing migration activities so it’s clear the data creation and any related activities were done as part of the migration. Any record of a document that is created will clearly show that it was done as part of a migration.
Consider the impact on existing users when migrating data into a live Vault.
Migrations can often be a computing-intensive process. For large or complicated migrations, you should schedule migration activities during periods of low user activity such as evenings or weekends.
When enabled, Configuration Mode prevents non-Admin users from logging into Vault. Use Configuration Mode if you need to prevent end-users from accessing Vault during a migration.
You can configure user access control to hide migrated data from existing users until the cutover is complete.
Migrating documents into Vault can be done using the Create Multiple Documents endpoint. An alternative is to use the Vault Loader Command Line Interface (CLI) or API by following the tutorial for Creating & Downloading Documents in Bulk.
When loading documents into Vault, first upload the files to the Vault file staging server. This includes the primary source files and document artifacts such as versions, renditions and attachments. This should be done far in advance, as the upload can take time. Vault Loader or one of the bulk APIs carry out the file processing to create documents in Vault.
The same files from sandbox test runs can be reused for subsequent production migrations if the files haven’t changed. To do this, re-link the file staging area from one Vault to another by contacting Support. Further details on re-linking can be found in Scalable FTP.
Document Migration Mode is a Vault setting which loosens some of the Vault constraints that are typically enforced to make the migration of data into Vault run more smoothly. Use the Create Multiple Documents or Load Data Objects endpoints to enable this setting using the API.
To use this setting, the migration user must have the Vault Owner Actions : Document Migration permission in their security profile’s permission set.
You should disable custom functionality (such as entry actions, custom Java SDK, or jobs), if required. Ensure reference values, such as Lists of Values (LOVS), exist and are active if referenced in migration data.
It can take time for documents to appear in Vault searches, renditions, or thumbnails once they have been migrated in. For large migrations, document indexing can take several hours. Account for ingestion delay when verifying the existence of migrated documents in Vault.
It is common to suppress document rendition generation or provide your own renditions for Vault migrations. If you choose not to suppress renditions, it will take a significant amount of time for Vault to process large quantities of renditions. See the Rendition Status page to monitor the progress of rendition jobs.
Bulk APIs don’t exist for migrating binders and folders into Vault, therefore, allocate sufficient time for this to take place. Consider whether the existing structures are still needed after migrating the data into Vault.
After migrating documents, jobs run that provide notifications to users via email, such as periodic review or expiration. Users for each environment should be forewarned that this may occur. Some users may receive a large number of emails.
Migrate objects into Vault using the Create Object Records endpoint. An alternative is to use the Vault Loader Command Line Interface (CLI) or API by following the tutorial for Loading Object Records.
Record Migration Mode allows the migration of object records in non-initial states within lifecycles. Use the Create Object Records or Load Data Objects endpoints to enable this setting using the API.
To use this setting, the migration user must have the Vault Owner Actions : Record Migration permission in their security profile’s permission set.
Record triggers execute custom Vault Java SDK code whenever a data operation on an object occurs. If custom functionality isn’t needed during the migration, disable the record triggers to prevent them from running. These can be re-enabled once the migration is complete.
Administer a sandbox Vault from the production (or validation) Vault and perform any custom configurations. This is typically done in conjunction with an implementation. At this stage, you can determine the structure of the environment into which the data will be migrated.
Reference data, such as picklists and Vault objects are included with the sandbox, but you will need to load other reference data that your migration depends on. Use Test Data packages to create packages of reference data.
Perform a dry run migration to test the migration logic, data, and approximate timings. It’s not necessary to dry run full data volumes. Logic and timings can be validated using smaller datasets. If the migration fails, correct the issues in the development environment before migrating to the test environment.
Once data has been migrated into Vault, verify the data is as expected. This involves a number of different checks, such as:
Several complications can occur when populating Vault metadata. Consider the following best practices to transform data before a migration.
CSV files used to create or update documents using Vault Loader must use UTF-8 encoding and conform to RFC4180.
Dates migrated into Vault must use the format
Date/time conversion must use the Coordinated Universal Time (UTC) format
YYYY-MM-DDTHH:MM:SS.sssZ, for example
2019-07-04T17:00:00.000Z. Hence it must end with the
000Z UTC expression, although the zeros can be any number. Ensure that date/time fields map to the correct day. This may be different depending on the time zone.
If Vault metadata is case-sensitive, convert it to match the expected format.
Metadata must not contain special characters, for example, tabs and smart quotes. These special characters can be lost when migrating data into Vault.
Saving Excel™ files in CSV format for use with Vault Loader can corrupt the file in an undetectable manner. If the file becomes corrupt, your load will fail. Failure logs contain a record of each row that has failed and are accessible by email or Vault notification. Correct the CSV files to continue loading.
If the data being migrated is multilingual, ensure your Vault is configured to support different languages.
When mapping multi-value fields, values with commas can be entered through quoting and escaping. For example,
“veeva,,vault“ is equivalent to
Data formatting can differ per environment. For instance, a line separator behaves differently when being from Windows™ or a MacOS™.
Format Yes/No fields as
false when migrating using the API. This doesn’t apply to Vault Loader, as it handles boolean values regardless of case.
Remove any trailing spaces from metadata. These are commonly found after commas.
Migrate numbers in String fields as String values to preserve leading zeros and prevent their conversion to integers.
On documents or object records where Name is not unique or is system-managed, set the External ID (
external_id__c) to relate it to the original ID used in the legacy system. Additionally, this field helps distinguish between records in success and failure logs.
Values in Long Text fields must not exceed the maximum length configured in Vault. Vault Loader does not truncate these values.
Documents and objects can reference users (
user__sys) and persons (
person__sys) records. These records must be active in order to be referenced. If referencing people who have left the company or had a name change, reference a person record as it does not have to be linked to a Vault user account. User name and person name are not unique, therefore, external IDs must be referenced for these objects.
Many object records have relationships with other records. For example, the Study Site object has a field called
study_country__v of data type Parent Object which links it to the Study Country object. If you create a new Study Site record using Vault Loader or the API and happen to know the ID for the desired Study Country record,you can populate it. However, these IDs will change based on the Vault environment. Use a lookup table to obtain the Vault record IDs from the
external_id__v fields. An alternative is to use an object lookup field in the form
study_country__vr.name__v = 'United States'.