The world’s largest, fastest-growing, purpose-built library

Over the last four years, TetraScience has been replatforming and engineering scientific data from hundreds of thousands of data silos. Our goal is to combat the inherent fragmentation of the scientific data ecosystem to unlock Scientific AI.

The approach we’ve taken—leveraging productization, using a data stack designed for scientific data, and adhering to a vendor-agnostic business model—has been very challenging. However, we firmly believe that our strategy is the only way to generate the large-scale, liquid, purpose-engineered datasets that Scientific AI requires.

In this blog, we’ll share where we stand today, what we’ve learned, and what we’re planning to do next.

It’s all about scientific data and it’s about all your data

‍

The biopharma industry is fragmented by over 10 million scientific data silos. Each line represents a point-to-point integration—introducing friction, potential data loss, and complex maintenance challenges when using custom or DIY approaches.

‍

TetraScience started its mission by replatforming and engineering one of the most notoriously challenging types of scientific data—instrument data. These datasets are often highly fragmented, mainly trapped in vendor proprietary or specific formats.

Over time, the Tetra library of integrations and data schemas has greatly expanded beyond instrument data and most notably includes:

Experimental context and design via bidirectional ELN/LIMS integration‍
Analysis results via apps accessed through the Tetra Data Workspace
Data from contract research organizations (CROs) and contract development and manufacturing organizations (CDMOs)

For each endpoint system, TetraScience aims to extract as much scientifically meaningful data as possible. In a previous blog post, we shared why our strategy is fundamentally data-driven and AI-native in contrast to a middleware approach that is application-driven and, therefore, more limited by nature.

The largest library and highest coverage

Considering the vast array of scientific data sources within the biopharma industry, how does TetraScience’s library of productized integrations and data schemas measure up? Let's evaluate our library from three different perspectives.

For a typical biopharma

Since the beginning of 2024, three of the many biopharmaceutical companies we are partnering with, have shared their instrument inventory lists with us. This allowed us to identify how to replatform their scientific data to the cloud and enable analytics and AI.

Upon analyzing their lists, we discovered that on average TetraScience’s integrations already supported the integration of over 95 percent of all their lab instruments. Thus, by and large, we could fulfill their immediate needs for lab data automation and scientific data management.

Our process of engineering scientific data involves harmonizing vendor-specific data into Tetra Data, using an open, vendor-agnostic format. Initially, our data schemes supported roughly 60 percent of their instruments. The resulting transformed data (Tetra Data) allows these organizations to harness their scientific data for analytics and AI whenever ready, thereby future-proofing their data strategy.

Digging into the priorities of their instruments, we found that our library covered 100 percent of the highest-priority instruments for data replatforming and data engineering. This is unsurprising since TetraScience's library has been developed based on customer requests and thus supports the most popular scientific use cases. Therefore, it serves biopharma’s expected business outcomes and provides significant value.

For a popular endpoint category

Another way to illustrate TetraScience’s coverage is to map the TetraScience library to common endpoint categories. Here are some examples (but they represent only a small subset of the full library). Contact us to access the entire library of integrations.

Instrument Technique or Endpoint Type	Vendor	Software or Instrument
Liquid Chromatography	Cytiva	UNICORN
	Shimadzu	LabSolutions
	Thermo Fisher	Chromeleon
	Waters	Empower
Mass Spectrometry	Waters	MassLynx, Empower
	Thermo Fisher	Xcalibur
	SCIEX	Analyst
Plate Reader	BioTek	Gen5, Synergy
	BMG LABTECH	CLARIOstar, FLUOstar, NEPHELOstar, PHERAstar, POLARstar, SPECTROstar
	Bruker	OPUS
	Luminex	Magpix
	Meso Scale Discovery	MESO Quickplex SQ 120, MESO SECTOR S 600
	Revvity	MicroBeta2, TopCount, EnVision
	Sartorius	Octet
	Tecan	Spark, Sunrise
	Thermo Fisher	CellInsight CX5, CX7, FluoroSkan, VarioSkan LUX
	Unchained Labs	Stunner, Lunatic, DropSense96
	Wyatt	DynaPro
ELN/LIMS	Benchling	Notebook
	BIOVIA	ONE Lab
	IDBS	E-WorkBook
	Revvity	Signals Notebook
	LabWare	LIMS
	LabVantage	LIMS
	Dotmatics	Studies

‍

For an end-to-end scientific workflow

Next, let's consider a scientific use case. Bioprocessing involves a series of carefully controlled and optimized steps to produce pharmaceutical products from living cells. A typical workflow is divided into three stages: upstream processing, downstream purification, and characterization and critical quality attribute (CQA) monitoring. Throughout this process, a large number and diversity of scientific data sources are used. Below, we list the endpoints from the Tetra library that cover each stage of the bioprocessing workflow end to end.

‍

‍

Upstream Bioprocessing (Cell Culture Fermentation)

Data Source	Vendor	Software or Model
Bioreactor	Eppendorf	DASGIP
Bioreactor	Sartorius	Ambr250 (via KepServerEx Connector)
Plate Reader (titer, productivity)	BioTek	ELX808, Synergy HTX, Synergy 2, Synergy H1
	BMG LABTECH	CLARIOstar, POLARstar, FLUOstar, NEPHELOstar, PHERAStar
	Meso Scale Discovery	Quickplex SQ 120, MESO Sector S 600
	Molecular Devices	SpectraMax
	Revvity	Envision, MicroBeta 2
	Sartorius	Octet HTX, Octet Red96
	Tecan	Spark, Sunrise, Infinite 200
	Thermo Fisher	CellInsight CX5/CX7, FluoroSkan
	Unchained Labs	Lunatic, Little Lunatic, DropSense96
	Wyatt	DynaPro DLS
Cell Counter/Viability	Beckman Coulter	ViCell Blu, ViCell XR
	Chemometec	NucleoCounter NC-200, NC-202
	Roche	Cedex Hi-Res (via AGU SDC)
Chemistry Analyzer (cell viability, metabolites)	Beckman Coulter	MetaFlex (via AGU SDC)
	Nova Biomedical	BioProfile FLEX2
	Roche	Cedex Bio, Cedex BioHT (via AGU SDC)
Liquid Chromatography (protein quality)	Thermo Fisher	Chromeleon
Liquid Chromatography (protein quality)	Waters	Empower (bidirectional capability)

‍

Downstream Bioprocessing and Purification

Data Source or Target Type	Vendor	Software or Model
Electronic Lab Notebook (ELN)	IDBS	E-Workbook
	Revvity	Signals Notebook
	Benchling	Notebook
Laboratory Information Management System (LIMS)	LabWare	LIMS
Laboratory Information Management System (LIMS)	LabVantage	LIMS
Fast Protein Liquid Chromatography	Cytiva	UNICORN, ÄKTA avant, ÄKTA express, ÄKTA micro, ÄKTA pure

‍

Characterization and CQA Monitoring

Data Source or Target Type	Vendor	Software or Model
Electronic Lab Notebook (ELN)	IDBS	E-Workbook
	Revvity	Signals Notebook
	Benchling	Notebook
Laboratory Information Management System (LIMS)	LabWare	LIMS
Laboratory Information Management System (LIMS)	LabVantage	LIMS
pH Meter	Mettler Toledo	LabX
Spectrophotometer	Agilent	Cary 60
Spectrophotometer	Thermo Fisher	NanoDrop 8000
Plate Reader	Meso Scale Discovery	MESO Quickplex SQ 120, MESO SECTOR S 600
	Revvity	EnVision, MicroBeta 2
	BMG LABTECH	CLARIOstar, POLARstar, FLUOstar, NEPHELOstar
	Molecular Devices	SpectraMax
	Tecan	Spark, Sunrise, Infinite 200
	BioTek	ELX808, Synergy HTX, Synergy 2, Synergy H1
	Sartorius	Octet HTX, Octet Red96
	Thermo Fisher	CellInsight CX5/CX7, FluoroSkan
	Unchained Labs	Lunatic, Little Lunatic
Wyatt	DynaPro DLS
Capillary Electrophoresis	Revvity	LabChip GXII Touch
Capillary Electrophoresis	ProteinSimple	Maurice
Gel Imager	Bio-Rad	Gel Doc XR+
Gel Imager	Azure Biosystems	Imaging System
Light Scattering System	Malvern Panalytical	ZetaSizer
Light Scattering System	NanoTemper	Prometheus
High Performance Liquid Chromatography (HPLC)	Waters	Empower (bidirectional capability)
	Thermo Fisher	Chromeleon
	Wyatt	ASTRA SEC-MALS
Fast Protein Liquid Chromatography (FPLC)	Cytiva	UNICORN, ÄKTA avant, ÄKTA express, ÄKTA micro, ÄKTA pure
Quantitative Polymerase Chain Reaction (qPCR)	Thermo Fisher	QuantStudio 7 Pro, 7 Flex, 12K Flex, ViiA7
Nuclear Magnetic Resonance (NMR)	Bruker	AVANCE 700
Surface Plasmon Resonance (SPR)	Bruker	Sierra SPR32
Liquid Handler	Tecan	D300e, Fluent
	Beckman Coulter	Biomek i7
	Hamilton	Microlab STAR

‍

The fastest-growing library

For any data sources not yet supported by TetraScience, you can read about our approach in this blog post: What customers need to know about Tetra Integrations and Tetra Data Schema. In short:

TetraScience publishes our library and roadmap transparently so you know exactly what is currently in the library and how we plan to grow it: Data replatforming/integration library (layer 1) and Data engineering library (layer 2)
TetraScience provides monthly newsletters announcing the latest releases for data integrations and data models: TetraConnect News
Customers can request to add or accelerate items in the roadmap. TetraScience prioritizes requests based on criticality and impact. If there is a component to productize, TetraScience will create and maintain it for all customers.

In the last two years, TetraScience has been able to deliver more than fifty new or material improvements to our library every six months. With the introduction of our Pluggable Connector Framework, TetraScience will further accelerate this tempo.

TetraScience also publishes guidelines to select customers on how to build Intermediate Data Schemas (IDSs), which accelerates their ability to extend the library. For example, this video from our training team teaches users how to create their own pipelines: Self-Service Tetra Data Pipelines.

In addition to industrializing components for our library, TetraScience has rolled out ready-to-use validation scripts as part of our GxP Package. Our verification and validation (V&V) document set is designed to help customers save as much as 80 percent of their validation effort, allowing them to focus on the “last mile validation.”

Learn more about some of our data replatforming and engineering library:

Benchling Notebook
Bio-Rad ddPCR
Molecular Devices SoftMax Pro
Thermo Fisher Chromeleon
Waters Empower: Tetra Empower Agent v5.1 and v5.2
Revvity Signals Notebook and Waters Empower (bidirectional integration)
Benchling Notebook and Thermo Fisher Chromeleon (bidirectional integration)
Revvity Signals Notebook, Metrohm Tiamo, and Mettler Toledo LabX (bidirectional integration)

The only purpose-built library

Our journey will never be completed. However, we’re eager to share the significant amount of investment and work TetraScience has made to fulfill our promise to the industry. This commitment involves combining our expertise in technology, data, and science to deliver material impact.

Understand and overcome the limitations of endpoint systems

Most of the systems are not designed to be interfaced with data analytics and AI. Their primary function is to execute some scientific workflows, rather than to preserve and surface information for analytics or AI. Here are some of the most challenging situations we have observed:

Change detection is extremely difficult for common lab data systems, such as chromatography data systems (CDS). A typical CDS controls hundreds of HPLCs, holds data from thousands of projects, and supports hundreds to thousands of scientists. As a result, it can be virtually impossible to efficiently detect modifications or new runs.
Binary data files are prevalent choices by vendors, and they are only readable inside the vendor’s own analysis software. Sometimes, these vendors provide a software development kit (SDK). However, because the instrument control software must be installed concurrently for the SDK to function, it does not qualify as a true SDK. Also, vendors often restrict any third party from using key libraries in the SDK.
Data interfaces are often undocumented, incorrect, or not designed for analytics or data automation. For example, some lab data systems can return incorrect or conflicting data if using different interfaces, or fail to handle periodic polling on the order of minutes. Anticipating or reproducing these scenarios is often impossible without large-scale data or real lab facilities.

Understand the science and the scientific purpose

A typical approach in the industry is to focus on the integration of instruments and scientific applications without considering the larger picture of the scientific use case. While having many industrialized and validated integrations and data schemas is undeniably essential, it is critical to also have the scientific workflow and purpose in mind.

What is the scientific end-to-end data workflow of the scientist?
Is this workflow part of a larger process?
What does the scientist want to achieve, and what is the desired outcome?
Which data and results are relevant to achieve it?
What is the relevant scientific metadata?
For which purpose does the scientist need this metadata today (e.g., search, data aggregation, analytics)?
How might the scientist want to leverage this data later in different applications?
What other purposes might the scientist have for the data in the future?
What are the functional and non-functional requirements to fulfill the scientific use cases?

Being able to answer these questions will help create the best possible data workflows using suitable integrations and data schemas. To ensure our library is purpose built for science, 48 percent of TetraScience’s staff has a scientific background and 54 percent has advanced degrees (M.S. or Ph.D.).

Mimic a scientific workflow via live instrument testing

One of the most important lessons we have learned is that scientific data formats vary widely and are subject to different kinds of configurations, assays, hardware modules, and operating systems used. As a result, TetraScience has started contracting with various institutions to perform live testing while scientists conduct real scientific workflows. This ensures that our integrations and schemas perform as intended, delivering value to scientific workflows.

Next steps

TetraScience has, is, and will continue to invest in differentiating capabilities for the replatforming and engineering of scientific data from hundreds of thousands of data silos. This endeavor is combating the widespread fragmentation of the scientific data ecosystem. In 2024, we will:

Continue to evolve our foundational schema component library
Adapt our existing schemas to strengthen harmonization across specific scientific workflows and data sources
Evolve platform architecture to accelerate expansion of the data engineering library
Rapidly detect edge cases and remediate them through alerting and monitoring
Deploy and manage components at scale from a centralized platform
Perform exploratory testing focused on scientific use cases

We are on this journey together

Each of the endpoint systems holds your scientific data. In the modern data world, every organization like yours is demanding seamless access to their data and liquid data flow in and out of these systems. The "tax or tariff" that certain vendors put on your data is no longer acceptable—nor does it have to be. This endpoint-centric data ownership mindset fundamentally does not work in our era of data automation, data science, and AI. Industrializing the building blocks of data replatforming and engineering is inevitable for the industry to move forward.

TetraScience provides the vehicle for this paradigm shift. When there is a tax or tariff on your data, we encourage every biopharma organization to participate in the success of your own data journey. For example, you can:

Submit justified requests to your endpoint provider, insisting that your data be freely accessible along with the related documentation.
Involve TetraScience in your planning process. TetraScience can help ensure that your agreements with endpoint vendors include sufficient requirements for openness and liquidity of your data generated or stored in these endpoint systems.

We are on this journey together. Contact one of our experts today to learn more or access the entire library of integrations.

Example H4

Example H5

Reimagine Scientific Data Management

Transform your data. Enable lab data automation. Drive analytics and AI.

Explore how