Blog

Replatform and engineer your scientific data with the world’s largest, fastest-growing, purpose-built library

March 8, 2024

Spin Wang
Co-Founder and CTO

Over the last four years, TetraScience has been replatforming and engineering scientific data from hundreds of thousands of data silos. Our goal is to combat the inherent fragmentation of the scientific data ecosystem to unlock Scientific AI. 

The approach we’ve taken—leveraging productization, using a data stack designed for scientific data, and adhering to a vendor-agnostic business model—has been very challenging. However, we firmly believe that our strategy is the only way to generate the large-scale, liquid, purpose-engineered datasets that Scientific AI requires. 

In this blog, we’ll share where we stand today, what we’ve learned, and what we’re planning to do next. 

It’s all about scientific data and it’s about all your data

TetraScience started its mission by replatforming and engineering one of the most notoriously challenging types of scientific data—instrument data. These datasets are often highly fragmented, mainly trapped in vendor proprietary or specific formats.

Over time, the Tetra library of integrations and data schemas has greatly expanded beyond instrument data and most notably includes:

  1. Experimental context and design via bi-directional ELN/LIMS integration
  2. Analysis results via apps accessed through the Tetra Data Workspace
  3. Data from contract research organizations (CROs) and contract development and manufacturing organizations (CDMOs)

For each endpoint system, TetraScience aims to extract as much of the scientifically meaningful data as possible. In a previous blog post, we shared why our strategy is fundamentally data-driven and AI-native in contrast to a middleware approach that is application-driven and, therefore, more limited by nature. 

The largest library and highest coverage 

Considering the vast array of scientific data sources within the biopharma industry, how does TetraScience’s library of productized integrations and data schemas measure up? Let's evaluate our library from three different perspectives.

For a typical biopharma

Since the beginning of 2024, three of the many biopharmaceutical companies we are partnering with, have shared their instrument inventory lists with us. This allowed us to identify how to replatform their scientific data to the cloud and enable analytics and AI. 

Upon analyzing their lists, we discovered that on average TetraScience’s integrations already supported the integration of over 95 percent of all their lab instruments. Thus, by and large, we could fulfill their immediate needs for lab data automation and scientific data management. 

Our process of engineering scientific data involves harmonizing vendor-specific data into Tetra Data, using an open, vendor-agnostic format. Initially, our data schemes supported roughly 60 percent of their instruments. The resulting transformed data (Tetra Data) allows these organizations to harness their scientific data for analytics and AI whenever ready, thereby future-proofing their data strategy.

Digging into the priorities of their instruments, we found that our library covered 100 percent of the highest-priority instruments for data replatforming and data engineering. This is unsurprising since TetraScience's library has been developed based on customer requests and thus supports the most popular scientific use cases. Therefore, it serves biopharma’s expected business outcomes and provides significant value. 

For a popular endpoint category 

Another way to illustrate TetraScience’s coverage is to map the TetraScience library to common endpoint categories. Here are some examples: 

Instrument Technique or Endpoint Type Vendor Software / Instrument
Chromatography and HPLC Cytiva UNICORN
Shimadzu LabSolutions
Thermo Chromeleon
Waters Empower
Agilent OpenLab
Mass Spectrometry Waters MassLynx, Empower
Thermo Fisher Xcalibur
Sciex Analyst
Plate Reader BioTek Gen5, Synergy
BMG LABTECH CLARIOstar, FLUOstar, NEPHELOstar, PHERAstar, POLARstar, SPECTROstar
Bruker OPUS
Luminex Magpix
MSD MESO Quickplex SQ 120, MESO SECTOR S 600
Revvity Microbeta2, TopCount, EnVision
Sartorius Octet
Tecan Spark, Sunrise
Thermo Fisher CellInsight CX5, CX7, FluoroSkan, VarioSkan LUX
Unchained Labs Stunner, Lunatic, DropSense
Wyatt DynaPro
ELN/LIMS Benchling Notebook
BIOVIA ONE Lab
IDBS E-Workbook
Revvity Signals Notebook
LabWare LIMS
LabVantage LIMS
Dotmatics Studies

For an end-to-end scientific workflow

Next, let's consider a scientific use case. Bioprocessing involves a series of carefully controlled and optimized steps to produce pharmaceutical products from living cells. A typical workflow is divided into three stages: upstream processing, downstream purification, and characterization and critical quality attribute (CQA) monitoring. Throughout this process, a large number and diversity of scientific data sources are used. Below, we list the endpoints from the Tetra library that cover each stage of the bioprocessing workflow end to end: 

Upstream Bioprocessing (Cell Culture Fermentation)

Data source Vendor Software / Model
Bioreactors Eppendorf DASGIP
Sartorius Ambr250 (via KepServerEx Connector)
Plate Readers (titer, productivity) Biotek ELX808, Synergy HTX, Synergy 2, Synergy H1
BMG LABTECH CLARIOstar, POLARstar, FLUOstar, NEPHELOstar, PHERAStar
MesoScale Discovery Quickplex SQ 120, MESO MESO Sector S 600
Molecular Devices SpectraMax
Revvity Envision, Microbeta 2
Sartorius Octet HTX, Octet Red96
Tecan Spark, Sunrise, Infinite 200
Thermo Fisher CellInsight CX5/CX7, FluoroSkan
Unchained Labs Lunatic, Little Lunatic, DropSense
Wyatt DynaPro DLS
Cell Counters / Viability Beckman Coulter ViCell Blu, ViCell XR
Chemometec NucleoCounter NC-200, NC-202
Roche Cedex Hi-Res (via AGU SDC)
Chemistry Analyzer (cell viability, metabolites) Beckman Coulter MetaFlex (via AGU SDC)
Nova Biomedical BioProfile FLEX2
Roche Cedex Bio, Cedex BioHT (via AGU SDC)
Liquid Chromatography (protein quality) Thermo Fisher Chromeleon
Waters Empower (bidirectional capability)

Downstream Bioprocessing and Purification

Data Source / Target Type Vendor Software / Model
Electronic Lab Notebook (ELN) IDBS E-Workbook
Revvity Signals Notebook
Benchling Notebook
Laboratory Information Management System (LIMS) LabWare LIMS
LabVantage LIMS
Fast Protein Liquid Chromatography Cytiva UNICORN, ÄKTA avant, ÄKTA express, ÄKTA micro, ÄKTA pure

Characterization and CQA Monitoring

Data Source / Target Type Vendor Software / Model
Electronic Lab Notebook (ELN) IDBS E-Workbook
Revvity Signals Notebook
Benchling Benchling
Laboratory Information Management System (LIMS) LabWare LIMS
LabVantage LIMS
pH Meter Mettler Toledo LabX
Spectrophotometer Agilent Cary 60
Thermo Fisher NanoDrop 8000
Plate Reader MesoScale Discovery Quickplex SQ 120, MESO, MESO Sector S 600
Revvity EnVision, Microbeta 2
BMG LABTECH CLARIOstar, POLARstar, FLUOstar, NEPHELOstar
Molecular Devices SpectraMax
Tecan Spark, Sunrise, Infinite 200
BioTek ELX808, Synergy HTX, Synergy 2, Synergy H1
Sartorius Octet HTX, Octet Red96
Thermo Fisher CellInsight CX5/CX7, FluoroSkan
Unchained Labs Lunatic, Little Lunatic
Wyatt DynaPro DLS
Capillary Electrophoresis Revvity LabChip GXII Touch
ProteinSimple Maurice
Gel Imager Bio Rad Gel Doc XR+
Azure Biosystems Imaging System
Light Scattering System Malvern Panalytical ZetaSizer
NanoTemper Prometheus
High Performance Liquid Chromatography (HPLC) Waters Empower (bidirectional capability)
Thermo Fisher Chromeleon
Wyatt ASTRA SEC-MALS
Fast Protein Liquid Chromatography (FPLC) Cytiva ÄKTA avant, express, micro, pure
Cytiva UNICORN
Quantitative Polymerase Chain Reaction Instrument (qPCR) Thermo Fisher QuantStudio 7 Pro, 7 Flex, 12K Flex, ViiA7
Nuclear Magnetic Resonance (NMR) Bruker AVANCE 700
Surface Plasmon Resonance (SPR) Bruker Sierra SPR32
Liquid handlers Tecan D300e
Tecan Fluent
Beckman Coulter Biomek i7
Hamilton Microlab STAR

The fastest-growing library 

For any data sources not yet supported by TetraScience, you can read about our approach in this blog post: What customers need to know about Tetra Integrations and Tetra Data Schema. In short:

  • TetraScience publishes our library and roadmap transparently so you know exactly what is currently in the library and how we plan to grow it: Data replatforming/integration library (layer 1) and Data engineering library (layer 2)
  • TetraScience provides monthly newsletters announcing the latest releases for data integrations and data models: TetraConnect News 
  • Customers can request to add or accelerate items in the roadmap. TetraScience prioritizes requests based on criticality and impact. If there is a component to productize, TetraScience will create and maintain it for all customers. 

In the last two years, TetraScience has been able to deliver more than fifty new or material improvements to our library every six months. With the introduction of our Pluggable Connector Framework, TetraScience will further accelerate this tempo. 

TetraScience also publishes guidelines to select customers on how to build Intermediate Data Schemas (IDSs), which accelerates their ability to extend the library. For example, this video from our training team teaches users how to create their own pipelines: Tetra Product Short Cuts: Self-Service Tetra Data Pipelines.

In addition to industrializing components for our library, TetraScience has rolled out ready-to-use validation scripts as part of our GxP Package. Our verification and validation (V&V) document set is designed to help customers save as much as 80 percent of their validation effort, allowing them to focus on the “last mile validation.” 

Learn more about some of our data replatforming and engineering library:

The only purpose-built library 

Our journey will never be completed. However, we’re eager to share the significant amount of investment and work TetraScience has made to fulfill our promise to the industry. This commitment involves combining our expertise in technology, data, and science to deliver material impact. 

Understand and overcome the limitations of endpoint systems 

Most of the systems are not designed to be interfaced with data analytics and AI. Their primary function is to execute some scientific workflows, rather than to preserve and surface information for analytics or AI. Here are some of the most challenging situations we have observed: 

  • Change detection is extremely difficult for common lab data systems, such as chromatography data systems (CDS). A typical CDS controls hundreds of HPLCs, holds data from thousands of projects, and supports hundreds to thousands of scientists. As a result, it can be virtually impossible to efficiently detect modifications or new runs.
  • Binary data files are prevalent choices by vendors, and they are only readable inside the vendor’s own analysis software. Sometimes, these vendors provide a software development kit (SDK). However, because the instrument control software must be installed concurrently for the SDK to function, it does not qualify as a true SDK. Also, vendors often restrict any third party from using key libraries in the SDK. 
  • Data interfaces are often undocumented, incorrect, or not designed for analytics or data automation. For example, some lab data systems can return incorrect or conflicting data if using different interfaces, or fail to handle periodic polling on the order of minutes. Anticipating or reproducing these scenarios is often impossible without large-scale data or real lab facilities.

Understand the science and the scientific purpose

A typical approach in the industry is to focus on the integration of instruments and scientific applications without considering the larger picture of the scientific use case. While having many industrialized and validated integrations and data schemas is undeniably essential, it is critical to also have the scientific workflow and purpose in mind. 

  • What is the scientific end-to-end data workflow of the scientist? 
  • Is this workflow part of a larger process?
  • What does the scientist want to achieve, and what is the desired outcome? 
  • Which data and results are relevant to achieve it? 
  • What is the relevant scientific metadata? 
  • For which purpose does the scientist need this metadata today (e.g., search, data aggregation, analytics)? 
  • How might the scientist want to leverage this data later in different applications?
  • What other purposes might the scientist have for the data in the future? 
  • What are the functional and nonfunctional requirements to fulfill the scientific use cases?

Being able to answer these questions will help create the best possible data workflows using suitable integrations and data schemas. To ensure our library is purpose built for science, 48 percent of TetraScience’s staff has a scientific background and 54 percent has advanced degrees (MS or Ph.D.). 

Mimic a scientific workflow via live instrument testing 

One of the most important lessons we have learned is that scientific data formats vary widely and are subject to different kinds of configurations, assays, hardware modules, and operating systems used. As a result, TetraScience has started to contract with various institutions to perform live testing while scientists conduct real scientific workflows. This ensures that our integrations and schemas perform as intended, delivering value to scientific workflows.

Next steps

TetraScience has, is, and will continue to invest in differentiating capabilities for the replatforming and engineering of scientific data from hundreds of thousands of data silos. This endeavor is combating the widespread fragmentation of the scientific data ecosystem. In 2024, we will: 

  1. Continue to evolve our foundational schema component library
  2. Adapt our existing schemas to strengthen harmonization across specific scientific workflows and data sources
  3. Evolve platform architecture to accelerate expansion of the data engineering library 
  4. Rapidly detect  edge cases and remediate them through alerting and monitoring
  5. Deploy and manage components at scale from a centralized platform 
  6. Perform exploratory testing focused on scientific use cases

We are on this journey together

Each of the endpoint systems holds your scientific data. In the modern data world, every organization like yours is demanding seamless access to their data and liquid data flow in and out of these systems. The "tax or tariff" that certain vendors put on your data is no longer acceptable—nor does it have to be. This endpoint-centric data ownership mindset fundamentally does not work in our era of data automation, data science, and AI. Industrializing the building blocks of data replatforming and engineering is inevitable for the industry to move forward. 

TetraScience provides the vehicle for this paradigm shift. When there is a tax or tariff on your data, we encourage every biopharma organization to participate in the success of your own data journey. For example, you can:

  • Submit justified requests to your endpoint provider, insisting that your data be freely accessible along with the related documentation.  
  • Involve TetraScience in your planning process. TetraScience can help ensure that your agreements with endpoint vendors include sufficient requirements for openness and liquidity of your data generated or stored in these endpoint systems.

We are on this journey together. Contact one of our experts today.