This article is intended for customers considering a Vault integration project. It provides suggestions and best practices for planning and developing a Vault integration.
Note that not all integration types and patterns apply to CDMS Vaults. Learn more about CDMS APIs.
When considering implementing a Vault Integration project it is important that you select a type and pattern that is appropriate for your business needs. Depending on the business needs it may be best to use either a single pattern or multiple different patterns in combination.
It is recommended that you speak with your Veeva account team to understand these options prior to starting your integration project. Integrations can be performed by customers directly, with assistance from Veeva Technical Services or by engaging our Services Partners. Veeva also has many Technology Partners who either provide integrations directly to specialist tools that complement Vault’s existing functionality, or via middleware tools that connect Vault to third-party external systems. Some middleware vendors include pre-built connectors to systems including Vault to increase implementation speed.
A Vault integration refers to linking Vault with an external system, so the systems function together to achieve a connected business process. Depending on the integration objective, this may be initiated from either system when a predefined event occurs, at which point data will flow to the other system and subsequent actions may occur. The common types are:
For Vaults where enterprise security using single-sign on (SSO) is required SAML 2.0 should be configured so users can be authenticated against your chosen identity provider. Where SSO is required for all users it will be necessary to apply the same rules for any integration users. Hence the session details needed to make REST API will need to be established by adhering to the SSO rules for the authentication provider.
For external applications connecting to Vault with SSO enabled, authentication will be done using OAuth 2.0 / OpenID connect, to generate a session id. Learn more about OAuth 2.0 / OpenID connect.
Data Integration is the process that creates a universal set of both transactional and reference data across applications within an organisation. For each piece of data, one system is responsible for maintaining it and copying it to other systems.
This process is based on pulling the data in scheduled batches from Vault, incremental refresh can be restricted to only include data that has changed since the last refresh by adding a where clause for each object, where the modified date is after the last incremental refresh, for example using the Vault Loader
modified_date__v >= ‘2021-09-09T12:00:00.000’. Note that deleted records will not be included with this approach.
This process can be extended further by triggering the pull of object data from within Vault, by using the Vault initiated notification.
Learn more about the Vault components used in this process at Vault Loader.
This pattern is based on pulling the documents in scheduled batches from Vault, incremental refresh can be restricted to only include document that have changed since the last refresh by adding a where clause for each document, where the modified date is after the last incremental refresh, for example
modified_date__v >= ‘2021-09-09T12:00:00.000’. Note that deleted documents will not be included with this approach.
This pattern can be extended further by triggering the pull of documents from within Vault, by using the Vault initiated notification.
When loading data from an external system into Vault, it’s recommended to store the record identifier in either the Vault
external_id__v field or any other unique field, to be able to easily link the records. Learn more about the Vault components used in this process at VQL and Vault Loader.
When loading documents from an external system into Vault, it’s recommended to store the external document’s identifier in the Vault
external_id__v field or any other unique field, to be able to easily link the documents.
Some integrations require content from one system to be embedded in another, or for control to pass to another system, in order to offer users a more seamless experience.
The merging of the content from Vault with the content from the external systems is done within the browser to provide a more seamless experience for the user.
Current Vault context can also be passed to the external systems to make the embedded content more meaningful to the user or record in context as it relates to object or document data. The Vault session can also be passed securely to enable additional API calls via the REST API. Learn more about securely sending the session.
As with the embedded content pattern, context information can be passed in the link. Learn more about web actions.
Object fields can also be created containing hyperlinks to other websites using formula fields. Learn more about formula fields on objects.
For integrations that need to display Vault documents into an external website this can be done using the Vault external document viewer, using the pattern shown below:
Many integrations are process related, whereby when some event happens in one system, it needs to trigger a notification to a second system, so that some form of action can take place there. The second system may then in turn need to trigger a notification the first system and so on. Various patterns exist depending on which system initiates the process and whether the events are user initiated, automatic or scheduled from within Vault or controlled from within an external system.
Vault Java SDK provides a number of different SDK Entry Points from which to initiate custom business logic to notify the external something has happened along with the context of the data. These SDK Entry Points include (but are not limited to):
The mechanism for notifying the external systems depends on the use case:
last_modified_date__v or other VQL filters.
The pattern for integrating from either an external system or middleware system to Vault, is to use the Vault REST API. Hosted integrations built this way can both push and pull data to and from Vault. Both bulk and asynchronous processing are supported, for best performance. Java based applications should utilize the Vault API Library (VAPIL):
A Vault to Vault Connection relates to integrating data, documents and processes between Vault application families, either using productized connectors or custom integrations.
Standard productised connections are available between Vaults within the same domain to support specific business processes. These connections can be enabled through configuration without the need to develop custom solutions. Learn more about the latest standard Vault connections.
The pattern for enabling content from one Vault to be used in another Vault is done using CrossLinks. These can either be created from within the Vault UI or created automatically using either a standard Vault Connection or using Vault Java SDK code. Learn more about CrossLink documents.
This pattern uses custom Vault Java SDK code Spark messages to asynchronously notify another Vault data has changed placing it in an inbound Spark message queue. This is then processed by a Spark message processor, which performs a callback to the source Vault to retrieve details of any data that has changed since the previous load. Vault Connections and integration rules can be utilized to configure what data to pull between the Vaults.
Where the data being transferred may contain multiple records it’s recommended to use SDK jobs both outbound and inbound to batch records for bulk processing, as it’s far more efficient than processing individual records. This is achieved by scheduling the jobs to run at scheduled intervals, for instance every five minutes after a change has occurred.
Should any errors occur when processing records, such as a record missing a mandatory field, user exceptions can be logged for the integration for reporting to users.
When transferring documents it’s good practice to use cross-links rather than copying the files to ensure a single version of the truth across applications. Learn more about linking documents.
Custom connections can also be built using a middleware solution, using the outbound and inbound processes for handling documents and data. Veeva also partners with multiple certified technology partners who have validated tools for integrating with Vault.
Data lakes and data warehouses are commonly used by enterprise customers either as a central store for cross application data on which to perform analytics, or as an integration hub and single source of truth for multiple applications. Feeding data from Vault into data lakes and data warehouses for these purposes typically follows one of these patterns:
The most efficient method is to pull the data for objects and documents in batches using Scheduled Data Extract. It provides a daily incremental extract of any records that have been created, updated, or deleted in the last 24 hours, via either the file staging server or via an S3 bucket. It’s also the most efficient method for obtaining details of deleted records and audit trail.
By default Scheduled Data Extract files are asynchronously written to the Vault’s file staging server, in the form of CSV files containing details of any changes that have occurred in the last 24 hours. Middleware solutions can use the File Staging REST API to retrieve the CSV files, before transforming the data and loading it into the Data Lake / Data Warehouse.
Alternatively the Scheduled Data Extract can be configured to write the daily CSV files to an external S3 bucket instead. From there the Data Lakes and Data Warehouses can load the changes directly, usually without the need to write code.
Scheduled Data Extracts do not provide a baseline extract, so this will need to be done as a one-off incremental refresh.
In the event data refreshes are needed more frequently than every 24 hours, a custom data extract can be built to pull data from Vault using Vault loader extract either via the Vault REST API or Command line tool. This can be used to complement the Scheduled Data Extract. The pattern for how to do this can be found at Extracting Vault Object Data From Vault.
Migrations refer to loading large volumes of data or documents into Vault. Migrations typically occur as a one off project when migrating data from legacy systems or as an incremental migration. Learn more about Vault migrations.
Log Analysis and Security Information Event Management systems are often used by enterprise customers to provide an analysis of system usage and event monitoring for security purposes. The types of Vault audit trail logs include documents, objects, system, domain and login.
The pattern for extracting audit logs involves setting up a scheduled extract of the audit data using the process described in Batch Data Extracts.
Alternatively log information can be programmatically from Vault using the Audit History REST API, as shown:
Vault comes with a broad range of components to suit the different types and patterns of integration. The common components are:
The Vault REST API provides an extensive set of interfaces for pushing data into and pulling data from Vault. It also supports bulk and asynchronous processing. Learn more about the Vault REST API and the list of features in the latest version of API.
The Vault API Library (VAPIL) is an open-source Java library for the Vault REST API that includes coverage for all Platform APIs. Support for this is handled through our Developer Community. Learn more about VAPIL.
VQL (Vault Query Language) is used to access, retrieve, filter, and interact with Vault data, by running queries against the Query API, using an SQL-like language that is tailored to Vault. VQL queries provide an efficient way to retrieve Vault data in bulk for integrations through a single API call. Learn more about VQL.
MDL (Metadata Definition Language) is used to manage Vault configuration. Like DDL (Data Definition Language) in databases, you can use MDL to create, describe (read), update, and drop (delete) Vault components that make up its configuration. Learn more about MDL.
Vault allows you to schedule daily data export jobs, to push object records, audit history, and document metadata directly to your File Staging Server’s export folder or Amazon S3 Bucket. Vault exports extracted data using a CSV file per component, containing incremental data that has changed or been deleted since the previous day’s export. Learn more about Scheduled Data Export.
Each Vault comes with its own file staging server, which is a temporary storage area for files you’re uploading to or extracting from Vault and is widely used in integrations. Learn more about Vault’s File Staging Server.
Vault Loader allows you to load data to your Vault or extract data from your Vault in bulk. Loader is particularly useful during integrations and migrations where large numbers of records are being transferred. Due to the automatic nature of integrations Vault Loader is typically used via the API. Learn more about Vault Loader and the Loader API.
Configuration migration packages allow the migration of configuration changes or test data between two Vaults, by exporting configuration data in a Vault Package File (VPK) and then importing it into another Vault. This feature is particularly useful when your organization needs to configure and test in a sandbox Vault, and then move those configurations into a production Vault. Learn more about Configuration Migration Packages.
For integrations where there is a need to embed content from an external website or service within the Vault user interface this can be done by configuring Web Tabs, which are a type of Custom Tab. The page associated with the specified URL is displayed in an iframe within a Tab within the Vault UI. For further context it’s also possible to pass details of the Vault, session, and user parameters, that can in turn be used to make Vault REST API calls back to Vault, to retrieve further context-sensitive business data. For an example of using Web Tabs and passing the Vault session ID, see Vault Kanban Board Demo.
PromoMats and MedComms Vault customers can embed Vault documents into an external website or as a standalone using the external document viewer. This is only available with the Public Distribution and View-Based User licence type features enabled. Learn more about Vault External Document Viewer.
Admins may add web actions to the document or object record Actions menu. Web actions can invoke context-sensitive business logic or integrate with external systems and web sites. Learn more about web actions.
For integrations where there is a need to call an external URL from within a job without creating code, this can be done by configuring a web job. Learn more about defining jobs to call external URLs.
The Vault Java SDK is a powerful tool in the Vault Platform, allowing developers to extend Vault by implementing custom code such as triggers, actions, jobs, Spark messages and processors to be able to implement SDK integrations. Learn more about Vault Java SDK.
With the Vault Java SDK, you can build custom Vault SDK integrations to automate business processes across different Vaults or with an external system. Spark messaging allows your Vault to send messages from a Vault extension, and HTTP callout allows you to callback for any data you need. These operations perform asynchronously, allowing performant and seamless integration. Learn more about SDK integrations.
When Single Sign-on (SSO) is enabled for a user, Vault does not validate that user’s password. Instead, Vault relies on an external identity provider to authenticate users. Vault supports SSO using Security Assertion Markup Language (SAML) 2.0 for both Service Provider (SP) and Identity Provider (IdP) initiated profiles. Learn more about Configuring SAML Profiles.
Client applications that authenticate using enterprise authentication servers can authenticate user accounts using an OAuth 2.0 / Open ID Connect access token to obtain a Vault Session ID. To enable this it’s necessary to first Configure an OAuth 2.0 / OpenID Connect Profile in Vault. The user can then log in using the OAuth 2.0 / OpenID Connect API.
This topic identifies the common best practices that should be considered when integrating with Vault.
If any custom configuration is required to Vault as part of the integration this should be documented. Custom configuration may include custom tabs, documents, objects, lifecycles, connections, integration points, Vault Java SDK code, data and more. Creating export VPKs containing the custom configuration, also provides a good way to move it between Vaults.
To make the integrations easily maintainable, integration-specific Vault settings should be made configurable within the integrated solution wherever possible to avoid hardcoding changes and any resulting recompilation and revalidation of the solution.
Bulk processing should always be used whenever possible:
The Vault Loader service is a product service and has been developed with best practices and tested by Veeva. Taking advantage of the Loader API when transferring data into and out of Vault can greatly reduce the development time, as it handles processing, batching, and error reporting.
Integrations should also be done using Bulk APIs for data loading and data verification. Bulk APIs allow you to process many records or documents with a single API call. These APIs are designed for higher data throughput and will minimize the number of calls required.
VQL or Vault Query Language uses an SQL-like statement to be able to retrieve multiple records in a single Query API call. This is a far better alternative to making repeated calls to individual object APIs and should always be used wherever possible.
When either an object API or VQL query returns multiple records, Veeva paging should be used. This prevents the need for having to manually re-execute the cursor for each page and hence will result in far faster retrieval of data. Learn more about how to limit, sort, and paginate results.
For integrations which require the loading or retrieval of large numbers of documents, each Vault comes with its own File Staging Server to speed up this process and to limit the number of API calls being made. The recommended way to access the File Staging Server is using either the File Staging API or file staging command line interface.
Where reference data is used between systems, caching should be used. This prevents the need for potentially having to repeatedly retrieve the same reference data.
When passing data via the Vault REST API, it’s very important to ensure API rate limits are considered, as if the limits are breached, integrations will be throttled if the limits aren’t handled correctly. To mitigate the limits being breached bulk versions of APIs should always be used wherever possible.
In order to enable the cross referencing of data between Vault and the integrated system, it is recommended to store the external system’s record identifiers in both systems, wherever possible. For instance, if a Vault document is copied into the application the Vault Document ID should be stored within the integrated system. Conversely, documents/objects in Vault can be used to store the external IDs as Meta-data properties.
Where possible it is recommended to use a named account for a Vault session within an integration. Using named accounts ensures that the user in question will have the appropriate permissions on the object.
Within Vault it is possible to call services within third party systems by calling a service URL from within web actions, web tabs and web sections. When this method is used the SESSION_ID of the currently logged in user should be posted to ensure it is secure. Learn more about sending session Ids with a post message.
Once a session has been established it is recommended to keep using the session for API calls rather than establishing new sessions, by periodically calling the session keep alive API. Sessions do however timeout after a period of inactivity or after 48 hours. In the event this occurs, a mechanism needs to be established to re authenticate a user before any further API calls can be made. Learn more about the session keep alive API.
If multiple concurrent Vault Sessions (e.g. multi-threading, parallel instances) are used by the integration, it will be necessary to consider how the integration manages this. Where possible a single session should be reused.
When authentication is carried out via the Vault REST API using one of the auth API commands, it is necessary to check the “vaultId” returned in the response is for the intended Vault. This is necessary because if the specified Vault isn’t accessible for some reason (i.e. scheduled maintenance) and the user has access to multiple Vaults, they may instead be authenticated to another of their Vaults instead. Without a double check, they could inadvertently make changes to the wrong Vault with subsequent API calls.
The error handling strategy should be defined for each of the parts of the integration (i.e. UI/Background Process/etc.). It’s important that any errors are suitably trapped, reported and handled consistently. For instance, should an error occur in a UI a suitable message should be displayed to the user, along with a mechanism for drilling into the precise error, such as displaying a full stack trace.
Working with distributed systems, temporary errors sometimes occur such as brief network outages or unavailability of a downstream system. This will result in the need to be able to either resume or retry the transfer of data. Techniques such as implementing retry logic with an exponential backoff and using idempotency keys to ensure data is only transferred once, can be key aspects of a successful error handling strategy.
It’s also necessary to consider what happens if an error occurs midway through a process, leaving data in an inconsistent state. In these cases it will be necessary to either recover the data or resume the previously failed call.
Error logging should be possible within the integrated system, in order to trace any errors that could occur. Please note that any Vault API calls will automatically be logged within Vault to be able to determine the integration they originated from the Client ID must be passed, as discussed below.
In any integrations that use the Vault REST API, it’s recommended to set the Client ID, so should any errors occur during the migration Veeva will be better able to assist in determining what may have gone wrong.
A Sandbox Vault will need to be created from the production (or validation) Vault using the Administering Sandbox Vaults tools and any configuration customizations will need to be created. This would typically be done in conjunction with an implementation consultant. It will also be necessary to link the Vault to the Third-Party system Sandbox and populate any integration specific configuration settings.
A full set of tests should be carried out to test the integration logic, data, and any error conditions are successfully handled. If the integration tests fail, any issues should be corrected in the development environment before being reapplied to the test environment.