This article is intended for customers considering a Vault migration project. It provides suggestions and best practices for planning, developing, and executing a Vault migration.
A Vault migration refers to the loading of large volumes of data or documents into Vault. There are two common migration use cases:
When replacing a legacy system with Vault, it is common to migrate data from the legacy system into Vault. This type of migration usually occurs during the Vault implementation project for a new Vault, but can also happen when implementing a new application in an existing Vault.
Any business event which requires adding new data to Vault may require incremental migration. For example, a phased rollout, system consolidation, or a company acquisition. Incremental migrations occur independent of an application implementation and can affect a live production Vault application.
At a high level, data migration into Vault generally requires:
The complexity, effort, and timing of these steps varies based on the data volume, data complexity, system availability requirements, and tools that are used. Choosing the correct approach is the key to a successful migration project.
It is critical that you select an approach that is appropriate for the size and complexity of your migration project. Migrations can be performed by customers or with assistance from Veeva Migration Services and Certified Migration Partners who use validated tools and have experience performing Vault migrations. It is recommended that you speak with your Veeva account team to understand these options prior to starting your project.
The migration plan should consider dependencies between the Vault configuration and data preparation, as well as provide adequate time for testing and validation. It is important that you conduct performance testing to properly estimate the time it will take to complete migration activities.
Before carrying out a migration, it is necessary to inform Veeva at least three business days in advance or at least one week for large migrations by completing the Vault Migration Planning Form. This notifies Veeva technical operations and support teams of your migration project and allows them to prepare. Fill out this form for any migration that includes over 10,000 documents, 1,000 large (greater than 1 GB) documents (like videos), 500,000 object records, or any migration where the customer or services has concerns. Complete the form for each environment you will be migrating into.
This section identifies the common practices that should be considered when migrating documents, objects, or configuration into Vault.
Source data for a migration can come from legacy applications, file shares, spreadsheets, or even an existing Vault. The details of extracting data from its source format will depend on the system itself. Customers who are migrating from a complex source application often choose to work with a Certified Migration Partner who has experience extracting data from that application.
A key consideration for data extraction is minimizing downtime during the cutover from the legacy application to Vault. Often the cutover is done over a weekend. To support this, it is recommended to migrate the majority of data or documents in batches beforehand while the legacy system is still running and then only do a delta migration, extracting and loading only the data that has changed since the batch run, on the cutover weekend once you have turned the legacy system off. If the target Vault is already live, you can use user access control to hide the batch documents until the cutover.
Data extracted from the legacy system needs to be transformed before being migrated into Vault. Vault APIs and Vault Loader accept data in comma-separated values (CSV) format. During this process it’s necessary to map data between the legacy system and Vault. Review these Data Transformation Considerations before transforming your data.
When populating document or object fields which reference object records or picklists, first ensure the reference data exists in Vault. This reference data can be linked using either object lookup fields or picklist fields. This eliminates the need for system-generated IDs for related records.
To understand what document metadata values need to be populated during a migration, review the structure of the Vault Data Model. This can be achieved by running the Vault Configuration Report or via the Document Metadata API.
Vault automatically assigns major and minor document version numbers. The major version starts at one and then increments each time a new steady state is reached. At that time the minor version resets to zero and then increments with each minor change. Some legacy systems allow users to manually assign their own version numbers. Other legacy systems start version numbers at zero instead of one. As a result, the version number from the legacy system may not match those for documents created in Vault.
Lifecycle names and target states must be considered when mapping states. Source documents in “In” states (In Review, In Approval, etc.), other than In Progress, should not be migrated into Vault. Vault will not apply workflows to migrated documents.
Legacy Signature Pages must be in PDF format to be migrated into Vault.
If audit trail data is required for a migration, this can be done in a variety of ways. It’s recommended to convert the audit data into PDF format, link it to each document, and migrate it as an archived document.
We recommend customers either use Vault Loader or a Certified Migration Partners to load data into Vault. These tools have been tested and certified as best practice.
However, if you determine that you will develop your own migration tool using the Vault API you should consider the following:
The Vault Loader API endpoints or the Loader command line allow you to automate migration tasks. The Loader service handles processing, batching, error reporting, and is developed and tested by Veeva. Utilizing the Vault Loader API endpoints or the Loader command line can greatly reduce the migration time.
Migration should be performed using Bulk APIs for data loading and data verification. Bulk APIs allow you to create a large number of records or documents with a single API call. These APIs are designed for higher data throughput and will minimize the number of calls required. Refer to the table below to see which data types have Bulk APIs.
In any migrations that use the Vault REST API, it’s recommended to set the Client ID. If any errors occur during the migration, Veeva will be better able to assist in troubleshooting.
When migrating data via the Vault REST API, it’s important to consider API rate limits. If API rate limits are exceeded, integrations will be blocked from using the API. To mitigate exceeding limits, bulk versions of APIs should be used whenever possible. Migration programs should be written in such a way so that the limits are checked for each API call. If the burst or daily limit are within a 10% threshold of breaching, this is handled by either waiting until limits are available or stopping the migration process.
Consider creating a user specifically for performing migration activities so it’s clear the data creation and any related activities were done as part of the migration. Any record of a document that is created will clearly show that it was done as part of a migration.
Consider the impact on existing users when migrating data into a live Vault.
Migrations can often be a computing-intensive process. For large or complicated migrations, you should schedule migration activities during periods of low user activity such as evenings or weekends.
When enabled, Configuration Mode prevents non-Admin users from logging into Vault. Use Configuration Mode if you need to prevent end-users from accessing Vault during a migration.
You can configure user access control to hide migrated data from existing users until the cutover is complete.
Migrating documents into Vault can be done using the Create Multiple Documents endpoint. An alternative is to use the Vault Loader Command Line Interface (CLI) or API by following the tutorial for Creating & Downloading Documents in Bulk.
When loading documents into Vault, first upload the files to the Vault file staging server. This includes the primary source files and document artifacts such as versions, renditions and attachments. This should be done far in advance, as the upload can take time. Vault Loader or one of the bulk APIs carry out the file processing to create documents in Vault.
The same files from sandbox test runs can be reused for subsequent production migrations if the files haven’t changed. To do this, re-link the file staging area from one Vault to another by contacting Support. Further details on re-linking can be found in Scalable FTP.
Document Migration Mode is a Vault setting which loosens some of the Vault constraints that are typically enforced to make the migration of data into Vault run more smoothly. Use the Create Multiple Documents or Load Data Objects endpoints to enable this setting using the API.
To use this setting, the migration user must have the Vault Owner Actions : Document Migration permission in their security profile’s permission set.
You should disable custom functionality (such as entry actions, custom Java SDK, or jobs), if required. Ensure reference values, such as Lists of Values (LOVS), exist and are active if referenced in migration data.
It can take time for documents to appear in Vault searches, renditions, or thumbnails once they have been migrated in. For large migrations, document indexing can take several hours. Account for ingestion delay when verifying the existence of migrated documents in Vault.
It is common to suppress document rendition generation or provide your own renditions for Vault migrations. If you choose not to suppress renditions, it will take a significant amount of time for Vault to process large quantities of renditions. See the Rendition Status page to monitor the progress of rendition jobs.
Bulk APIs don’t exist for migrating binders and folders into Vault, therefore, allocate sufficient time for this to take place. Consider whether the existing structures are still needed after migrating the data into Vault.
After migrating documents, jobs run that provide notifications to users via email, such as periodic review or expiration. Users for each environment should be forewarned that this may occur. Some users may receive a large number of emails.
Migrate objects into Vault using the Create Object Records endpoint. An alternative is to use the Vault Loader Command Line Interface (CLI) or API by following the tutorial for Loading Object Records.
Record Migration Mode allows the migration of object records in non-initial states within lifecycles. Use the Create Object Records or Load Data Objects endpoints to enable this setting using the API.
To use this setting, the migration user must have the Vault Owner Actions : Record Migration permission in their security profile’s permission set.
Record triggers execute custom Vault Java SDK code whenever a data operation on an object occurs. If custom functionality isn’t needed during the migration, disable the record triggers to prevent them from running. These can be re-enabled once the migration is complete.
Administer a sandbox Vault from the production (or validation) Vault and perform any custom configurations. This is typically done in conjunction with an implementation. At this stage, you can determine the structure of the environment into which the data will be migrated.
Reference data, such as picklists and Vault objects are included with the sandbox, but you will need to load other reference data that your migration depends on. Use Test Data packages to create packages of reference data.
Perform a dry run migration to test the migration logic, data, and approximate timings. It’s not necessary to dry run full data volumes. Logic and timings can be validated using smaller datasets. If the migration fails, correct the issues in the development environment before migrating to the test environment.
Once data has been migrated into Vault, verify the data is as expected. This involves a number of different checks, such as:
Several complications can occur when populating Vault metadata. Consider the following best practices to transform data before a migration.
CSV files used to create or update documents using Vault Loader must use UTF-8 encoding and conform to RFC4180.
Dates migrated into Vault must use the format YYYY-MM-DD
.
Date/time conversion must use the Coordinated Universal Time (UTC) format YYYY-MM-DDTHH:MM:SS.sssZ
, for example 2019-07-04T17:00:00.000Z
. Hence it must end with the 000Z
UTC expression, although the zeros can be any number. Ensure that date/time fields map to the correct day. This may be different depending on the time zone.
If Vault metadata is case-sensitive, convert it to match the expected format.
Metadata must not contain special characters, for example, tabs and smart quotes. These special characters can be lost when migrating data into Vault.
Saving Excel™ files in CSV format for use with Vault Loader can corrupt the file in an undetectable manner. If the file becomes corrupt, your load will fail. Failure logs contain a record of each row that has failed and are accessible by email or Vault notification. Correct the CSV files to continue loading.
If the data being migrated is multilingual, ensure your Vault is configured to support different languages.
When mapping multi-value fields, values with commas can be entered through quoting and escaping. For example, “veeva,,vault“
is equivalent to “veeva,vault“
.
Data formatting can differ per environment. For instance, a line separator behaves differently when being from Windows™ or a MacOS™.
Format Yes/No fields as true
or false
when migrating using the API. This doesn’t apply to Vault Loader, as it handles boolean values regardless of case.
Remove any trailing spaces from metadata. These are commonly found after commas.
Migrate numbers in String fields as String values to preserve leading zeros and prevent their conversion to integers.
On documents or object records where Name is not unique or is system-managed, set the External ID (external_id__v
or external_id__c
) to relate it to the original ID used in the legacy system. Additionally, this field helps distinguish between records in success and failure logs.
Values in Long Text fields must not exceed the maximum length configured in Vault. Vault Loader does not truncate these values.
Documents and objects can reference users (user__sys
) and persons (person__sys
) records. These records must be active in order to be referenced. If referencing people who have left the company or had a name change, reference a person record as it does not have to be linked to a Vault user account. User name and person name are not unique, therefore, external IDs must be referenced for these objects.
Many object records have relationships with other records. For example, the Study Site object has a field called study_country__v
of data type Parent Object which links it to the Study Country object. If you create a new Study Site record using Vault Loader or the API and happen to know the ID for the desired Study Country record,you can populate it. However, these IDs will change based on the Vault environment. Use a lookup table to obtain the Vault record IDs from the name__v
or external_id__v
fields. An alternative is to use an object lookup field in the form study_country__vr.name__v = 'United States'
.
This section provides best practices on migrating Safety data into Vault.
The primary use case for a Safety Migration is a Legacy Migration. This involves migrating Safety Cases from a legacy system to Vault. This commonly includes migrating the most recent versions of Cases, but may include migrating previous versions as well.
Safety Migration Configuration is a feature that allows a designated user to migrate safety data into Vault via ETL Vault Loader or API, while ensuring performance and data integrity. This also allows Cases to be migrated into a live Vault without altering the migrated data by applying Case processing automation (such as calculations or record auto-creations).
Safety Migration User will be used consistently in this article, and it will always refer to the user selected as the Migration user in the Safety Migration Configuration.
Safety Migration Configuration bypasses most triggers and actions available only for the designated Safety Migration User. Only a small subset of key triggers continue to execute, required for creating object records. This improves the performance of loading in Safety Cases. See the list of Bypassed Auto-Calculations for more information.
If the user attempts to execute a trigger that is not allowed during migration, the following error message appears:
You do not have permission to modify {0} records. Contact your administrator if changes are required.
To enable the Safety Migration Configuration feature, create and change the lifecycle state to Active for the Safety Migration Configuration (safety_migration_configuration__v
) record and assign it to a Safety Migration User.
Because the Safety Migration Configuration is not shown in Business Admin by default, you can create a record using one of the following methods:
https://{vault_dns}/ui/#object/safety_migration_configuration__v
Safety Migration Configuration records contain the following fields.
Name | Description | Type |
---|---|---|
name__v | The system automatically generates a name for the record. | System managed |
user__sys | (Required) Select the User record that corresponds to the migration user. | (unique) Object reference to user__sys object |
enabled__v | To activate this configuration and allow this user to bypass triggers for migration, set to Yes. | Yes/No |
safety_migration_configuration__v
) records: enabled__v
field to NoVault Safety product triggers are bypassed by default when they are triggered by the Safety Migration User.
Additional code needs to be written for custom triggers to have the same behavior. Failure to do so will result in major performance issues during migrations.
The following code snippet illustrates how to bypass a trigger in migration mode:
@RecordTriggerInfo(object = "case_version__v", events = {RecordEvent.BEFORE_INSERT, RecordEvent.BEFORE_UPDATE})
public class SampleTrigger implements RecordTrigger {
@Override
public void execute(RecordTriggerContext recordTriggerContext) {
final QueryService queryService = ServiceLocator.locate(QueryService.class);
// Get the current user ID from the context
final RequestContext context = RequestContext.get();
final String currentUserId = context.getCurrentUserId();
// Query safety migration configuration for enabled users
final String queryMigrationUsers = "SELECT user__v FROM safety_migration_configuration__v WHERE enabled__v='true'";
final QueryResponse queryResponse = queryService.query(queryMigrationUsers);
final boolean isMigrationUser = queryResponse.streamResults()
.anyMatch(queryResult -> Objects.equals(currentUserId, queryResult.getValue("user__v", ValueType.STRING)));
if (isMigrationUser) {
return;
}
// Perform remaining trigger logic
}
}
The following Auto-Calculations do not calculate when using a Safety Migration Configuration.
_idate__v
) to Normalized Dates (_date__v
)This section provides best practices for migrating clinical study data into Vault.
The primary use case for a Clinical Study migration is an Incremental Migration. This commonly involves having one set of studies and then going live with a second set of studies. In this case, you must migrate additional data to accommodate the second set of studies.
Study Migration Mode helps load studies faster and reduces downtime during migrations. We recommend using Study Migration Mode for all CTMS migrations, particularly when handling large volumes of object data. Study Migration Mode is intended to be additive with Record Migration Mode. Learn more about Record Migration Mode in Vault Help.
When a Study enters Study Migration Mode, Vault makes study-related object data for that study hidden and uneditable for non-Admin users. This locks down target studies that are being migrated while allowing users to update documents and input data for the remaining studies. See the list of objects with the Migration field for more information.
Study Migration Mode also bypasses productized triggers for the target studies, such as calculating metrics and generating related records.
Certain jobs exclude studies that are In Migration from processing.
Standard Vault to Vault Connections exclude studies that are In Migration. Vault to Vault Connection jobs continue to process updated records that were bypassed while the study was being migrated.
If your study uses an object lifecycle, Admins in your Vault must configure user actions that mark a study as In Migration. Learn more about status and archiving studies in Vault Help.
You can enable Study Migration Mode for Study records using the following methods:
study_migration__v
field with the value m__v
.study_migration__v
with the value m__v
for all existing Study object records and related clinical object records that are within the scope of the migration.Consider the following when conducting a Clinical Study migration:
study_migration__v
field populated) from all users except Vault Owners, System Admins, and users with the Application: All Object Records: All Object Read permission.When Study Migration Mode is enabled, Vault also bypasses the Clinical App SDK by default.
You must write additional code for a custom SDK to have the same behavior as the Clinical App SDK. Because Study Migration Mode is controlled by the study_migration__v
field on a record, you should update the custom SDK to read this field and check if a study is in Study Migration Mode.
The following objects have a study_migration__v
field available for use in a Clinical Study migration:
budget__v
cdx_agreement__v
central_monitoring_event__ctms
clinical_user_task__clin
crm_activity__v
ctn__v
ctn_data__v
ctn_data_change_log__v
ctn_ip_name__v
ctn_remarks__v
ctn_site_ip__v
cycle_time__v
edl__v
edl_item__v
edl_item_template__v
enrollment_status__ctms
fee__v
fee_schedule__v
fee_schedule_template__v
fee_template__v
form_answer__v
icf_site_effective_tracking__ctms
informed_consent_tracking__ctms
metrics__ctms
metrics_over_time__v
milestone__v
milestone_package_document__v
monitored_informed_consent_form__ctms
monitored_metrics__ctms
monitored_subject__ctms
monitored_subject_event__ctms
monitored_subject_visit__v
monitoring_compliance__ctms
monitoring_event__ctms
monitoring_schedule__v
monitoring_schedule_template__v
payable_item__v
payee_override__v
payment__v
pdv__ctms
procedure__v
procedure_def__v
quality_issue__v
response__ctms
review_comment__v
safety_distribution__v
selected_site__ctms
site__v
site_fee__v
site_fee_def__v
site_sae_tracking__ctms
site_checklist__sys
site_checklist_pal__v
site_section__sys
study_arm__v
study_cohort__v
study_communication_log__ctms
study_country__v
study_critical_data__v
study_critical_process__v
study_organization__v
study_person__clin
study_product__v
study_risk__v
study_risk_assessment__v
study_risk_category__v
study_risk_mitigation__v
study_site_location__ctms
subject__clin
subject_informed_consent_form__v
team_assignment__v
trip_report_question_response__ctms
trip_report_template__ctms
visit__v
visit_def__v