Blog

TetraScience improves SQL performance, data transformation capabilities, and role management with the latest update to its Tetra Data Platform

August 28, 2024

Naveen Kondapalli
SVP, Product and Engineering

As the life science industry's only vendor-agnostic, purpose-built solution for scientific data and AI, TetraScience requires a flexible architecture to serve a fast-growing, varied user base of scientific analysts, data engineers, and data scientists. 

Flexibility was the central reason we adopted the lakehouse architecture in April with version 4.0 of our Tetra Data Platform, the foundation of the Tetra Scientific Data and AI Cloud™. We see an industry-wide transition underway from the hassles of file-based data to the ease of engineered data readily usable by AI and analytics tools. A lakehouse architecture powered by Delta tables accelerates our shift to analytics-optimized engineered datasets. 

The lakehouse decision continues to bear fruit with Version 4.1 of the Tetra Data Platform. Data in Delta tables yields significantly better SQL querying performance, among other benefits. Data scientists and analysts can unlock value more rapidly from replatformed and engineered data produced by the scientists they support. 

How we’re evolving TetraScience’s lakehouse architecture 

With Version 4.1, customers will benefit from the following improvements in our lakehouse architecture.  

Faster SQL performance on top of Delta tables

SQL is the lingua franca for data analysts, engineers, and scientists who look at schematized data. They want their SQL query results reliably and quickly. TetraScience already makes scientific metadata, Tetra Data, and other IDS data available through a SQL interface that translates a hierarchical schema (JSON) into a relational one (SQL). Introducing Delta tables optimizes the underlying storage, increasing query performance for all scientific users.

Merging multiple datasets into a single analysis table

A common task in analytics is joining data from different sources into a single table or data set. However, joining data can be devilishly complex, and updating merged tables continually with new information can be a chore. When adding a new source, data engineers need only update the pipeline to have the latest data appear downstream for scientists. No need to update their queries. 

Scheduled workflows reduce complicated trigger conditions 

Analytics workloads–like refreshing the data powering a dashboard–are often best managed on a schedule. TetraScience has added the capability to schedule data transformations, giving data engineers the flexibility to execute their workloads when they want without having to devise complex trigger conditions. 

For example, a system and method performance analysis dashboard may only be useful to Lab IT Managers weekly. Scheduling the data refresh to match the weekly latency need optimizes the data processing footprint, while ensuring end users have the latest data when they need it without having to reason about hundreds of record-specific updates to the underlying data.

Engineering chromatography data in the lakehouse architecture

The above benefits are available to all of our customers as part of our Early Adopter Program. Please reach out to your customer success manager if you’re interested in participating. See the product documentation for the Data Lakehouse Architecture Early Adopter Program here.

Additional customer benefits

Version 4.1 also improves access controls. Administration teams can now create custom roles based on nine new common access policies for roles such as developers, data users, or auditors. Customers can minimize unauthorized access with more precise permissions and create a more curated experience for the particular user role accessing TetraScience.

Custom roles configuration

Other items and benefits included in this TDP release are:

  • Local archive and delete for data acquired through File Log Agent v4.4+ (general availability on August 30th)
  • Early adopter access for a new Data Acquisition and Monitoring Dashboard to help monitor and troubleshoot data downtime issues stemming from ingestion failures or high latencies

With version 4.1, TetraScience has again put our customers’ needs at the center of our product evolution. Our multi-tenant solution now has the latest version of the Tetra Data Platform, and your customer success manager will contact you with details on upgrade schedules.