Blog

Create your own data schemas using the Tetra Data Schema Library

July 23, 2024

Micael Nicollier

In a previous blog post, we discussed the benefits of using Intermediate Data Schemas (IDSs) to decouple data sources (producers) and data targets (consumers). By using Self Service Pipelines (SSPs) along with IDSs, you can easily create your own building blocks to dynamically integrate instruments with custom task scripts and protocols. They help you adapt to an ever-evolving landscape of analytical tools and data systems.

These custom integrations consist of a protocol, associated task scripts, and an IDS file for each specific instrument type. Historically, creating or modifying an IDS file (schema.json) was an error-prone, manual process. It involved mapping each field and undergoing multiple validation cycles to meet the publishing requirements for the Tetra Data Platform (TDP).

To streamline this process for supported schemas, we developed the Tetra Data Schema Library. This tool is now available to our customers as part of our comprehensive scientific data toolchain. You can now programmatically define custom IDSs, ensuring consistent data structures across your organization. By building on platform-compliant and domain-specific schemas, you can significantly reduce development cycles.

Tetra Data Schema Library

The Tetra Data Schema Library, distributed as ts-ids-core and ts-ids-components, is built on top of Pydantic, a well-known Python library used for data validation that simplifies the handling of structured data. ts-ids-core serves as the foundation, providing built-in rule enforcement to ensure your IDSs are platform compliant and align with TetraScience’s data modeling best practices and common scientific data schemas. With a growing set of domain-specific components provided in ts-ids-components, such as those for chromatography methods, you can enforce consistency in your data models within instrument domains. Additionally, the programmatic definitions of IDSs are not just static schemas. They can be used within task scripts to leverage your IDS’s built-in type conversion and validation rules, ensuring data integrity. Ultimately, this allows you to create production-ready SSPs faster.

Example workflow of how the Tetra Data Schema Library works with SSPs

How it works

The ts-ids-core and ts-ids-components packages are privately hosted Python packages published under the CC BY-NC-ND 4.0 license. TetraScience provides access to an authenticated private package feed that you can integrate into your internal package source. To access the packages, contact your customer success manager.

Once the packages are installed, programmatically creating an IDS requires only a few steps:

  1. Use ts-ids-core and its built-in common components to model your instrument's raw data.
  2. Create the minimum required fields, such as @idsNamespace, @idsType, and @idsVersion, with a single command.
  3. Add any additional fields, either from common components within ts-ids-core or custom-built ones using our provided data types.
  4. Create the JSON format and publish it using ts-sdk, which ensures it aligns with IDS data models through built-in rules.

Common components in ts-ids-core and domain-specific components in ts-ids-components are building blocks that can be used across all IDSs and within IDSs modeling specific instrument domains. These components make data access more consistent by enabling the reuse of search queries and application code.

Example Tetra Data Schema

Convert existing IDSs

You can also use ts-ids-core and the provided code generation tool to convert an existing JSON schema to a programmatic IDS. This ensures compatibility with the latest TDP version and allows you to take full advantage of the functionality provided by ts-ids-core and Pydantic.

Get started

These packages will continue to evolve with additional common components, guided by feedback from the Tetra community.

For more information on creating your own programmatic IDSs and integrating instruments with TDP, visit our developer documentation or contact your customer success manager.