Scientific data often resides in complex systems like the Waters Empower Chromatography Data System (CDS). After decades of development, Empower CDS is filled with nuances that complicate enterprise-scale data management and analytics. Its tightly integrated toolkit is designed for operational tasks, not for analytics.
To overcome these obstacles, TetraScience has developed and continually refined an enterprise-ready solution that enables seamless access to chromatography data for analytics applications. This solution ensures data integrity, low latency, and resilience across analytical process development, product supply, and quality control departments.
Additionally, it enables administrators to fine-tune data processing to prioritize time-sensitive data with lower latency while supporting enterprise deployments of any scale.
Challenges in Chromatography Data Acquisition for Analytics
When reliably acquiring data with low latency from enterprise CDS deployments, companies encounter several challenges when leveraging this data for analytics. These issues are often coupled and mutually exclusive: optimizing for one metric can result in trade-offs or suboptimal performance in another. Other factors like vendor-supported interfaces and the IT infrastructure of enterprise customers impose further constraints.
Over the years, TetraScience has addressed the following with a delicate balancing act:
- Understanding of Source Data and Relationships: Empower data can be complex, with the same information (e.g., sample details) appearing in multiple locations (e.g., injection, results, and signed-off data) across the Empower ToolKit response. To simplify data consumption, TetraScience standardizes this structure, ensuring clarity and documentation to help users understand exactly where and how data is used.
- Data Integrity (“Complete” in ALCOA++): The Tetra Empower Agent uploads data to the Tetra Data Platform (TDP) only after an injection is marked as "ready." It continuously monitors the injection's status until acquisition is complete. During the data acquisition buffering stage or in cases of injection failure due to user abort or instrument error, the agent will postpone injection generation until it reaches a ready state. We have developed a comprehensive decision-making metric for managing this process. This prevents decision-making on potentially incomplete data.
- Throughput: A single Tetra Empower Agent can monitor over 5,000 Empower projects, ingesting up to 65,000 injections or 240 GB of data daily. This scalability addresses potential performance challenges, ensuring that large Empower installations can operate smoothly without data retrieval slowdowns.
- Reliability: The Tetra Empower Agent includes a robust fault tolerance mechanism designed to gracefully handle unexpected errors in Empower software and issues with network connections.
- Prioritization and Selective Project Ingestion: Empower instances often contain hundreds or thousands of projects. Users can selectively ingest only necessary projects, maintaining the original Empower project structure. In addition, the Tetra Empower Agent can automatically capture data from new projects as they are added, requiring no manual intervention.
Our Multi-Year Engineering Journey
TetraScience's efforts to unlock and optimize Empower CDS data spanned several critical phases:
- 2019–2020: We maximized data extraction capabilities by enhancing memory allocation and optimizing data fetching processes. Key accomplishments included retrieving comprehensive injection-related data fields, Empower user accounts (e.g., users, user types, and user groups) for enhanced auditing, and Empower Audit Trails (e.g., System Audit Trails, Project Audit Trails, and Message Centers).
- 2021–2022: Introduced multi-process architecture to improve throughput. Added bi-directional communication by allowing the users to create Empower Sample Set Method from TDP This design enhanced data acquisition speed and safeguarded data integrity across live experiments and mass spectrometry data points.
- 2023–2024: Focused on enterprise-grade latency tuning by introducing granularity configuration, reducing processing times, and handling native Empower errors. Added features like project archive and restore, resilience, and observability improvements, and the Chromatography Insights App for comprehensive data analysis. For further details, read our blog posts on the Tetra Empower Agent v5.1 and v5.2 releases.
- 2025 and Beyond: Our roadmap includes further improvements to reduce latency in acquiring Empower injection data and the development of new data applications for enhanced insights.
Empower Data Acquisition Deployment Architecture
Our solution ensures a seamless, automated end-to-end data flow, enabling efficient data acquisition and analysis. Here’s how the process unfolds:
- Sample Registration: Scientists register samples and generate sample lists using an electronic lab notebook (ELN) or laboratory information management system (LIMS).
- Sample Metadata Entry: Sample information is manually entered into Empower CDS or automatically transferred from the ELN/LIMS.
- Automated Data Detection: The Tetra Empower Agent continuously monitors Empower CDS for new data. Once sample injections are ready, the agent automatically detects and acquires the data.
- Data Engineering: The agent sends the Empower data to TDP, where it is engineered into an open, vendor-agnostic format according to an Intermediate Data Schema (IDS) and enriched with metadata.
- Searchable Data: The engineered data in TDP is indexed and made searchable through the Search API, SQL, or EQL.
- Analysis via Data Apps (optional): Finally, the structured Empower data can be pushed or pulled into customer-specific data apps for visualization or downstream analysis.
Empower Data Latency
With our architecture, customers can expect an average end-to-end latency of 15 minutes or less, with no user intervention needed, and a minimum latency of under 120 seconds from when the injection is complete. This ensures timely access to Empower chromatography data. The overall latency consists of three main components:
- Empower Latency: Time from when an injection is created and initiated in Empower to when the injection data is available for acquisition by the Tetra Empower Agent.
- TetraScience Latency: Duration of acquiring, uploading, and engineering the injection data in TDP, making it searchable for downstream analysis. This latency is controllable (see next section).
- Data Apps Latency: Time required for data to become available in dashboards and apps.
Data latency is impacted by various factors, including the number of agents deployed and the complexity of the data application. The benchmarks below are based on 50 projects, assuming that 20% have been updated per scan, each with 10 new injections.
Strategies for Optimizing TetraScience Latency
Reducing latency involves fine-tuning both the Tetra Empower Agent and the underlying IT infrastructure. TetraScience provides comprehensive deployment guides detailing best practices. Here’s how users can optimize their Empower data workflows:
Tetra Empower Agent Tuning
- Project Categorization: Assign projects as “high” or “normal” priority in the agent management console. High-priority projects are scanned first.
- Agent Settings: Adjust configurations such as the number of generation processes, high-priority process number, and injection prioritization time window.
- Horizontal Scaling: Distribute the scanning workload across multiple agents using horizontal scaling to minimize latency.
Customer IT Setup Tuning
- System Resource Management: Make sure the Empower server host and Tetra Empower Agent host are equipped with the recommended CPU and memory resources.
- Empower Server Tuning: Optimize server and database settings to enhance overall performance.
Closing Remarks
Our goal is to provide a scalable, robust, and easily configurable solution that integrates seamlessly into biopharma workflows. This includes support for major CDS platforms like Shimadzu LabSolutions and Thermo Scientific Chromeleon. By continuously innovating, we aim to empower scientists to extract critical insights from their chromatography data with unprecedented speed and efficiency.