Blog

Taming the Flow Beast: Casey's Epic Day of Cytometry, Data, and Questionable Server Space

September 17, 2024

Lindsey Weed
Scientific Business Analyst (Sciborg)

Dr. Casey Patel starts their day early to prepare for another rigorous session of flow cytometry experiments for their immunology research. Like many R&D scientists, they rely on this technique to generate detailed data on immune cell populations, but the sheer volume and complexity of the data present significant challenges. Here’s a closer look at a typical day, with a focus on the technical realities of managing flow cytometry data.

Morning: Sample Preparation and Calibration

The day begins with coffee, followed shortly by sample preparation. Casey needs to time their work correctly to ensure they’re ready to use the research core-managed flow cytometer within the time block they booked it for. Today’s experiment involves staining cultured immune cells with a panel of fluorochrome-conjugated antibodies targeting surface markers like CD3, CD4, CD8, and CD19. The goal is to characterize and quantify different T- and B-cell populations to understand the immune response to different lead candidates. Their results will be used to assess the efficacy of the drug candidates to determine if they are achieving their intended effect.

They succeed at getting to the flow cytometer on time, but before they can begin running their samples, they must perform routine calibration of the flow cytometer with fluorescently labeled beads. They carefully review the results as they are crucial for establishing baseline settings for fluorescence intensity, ensuring the instrument is functioning within expected parameters and preventing shifts in data that could lead to inaccurate results—a scenario all too familiar to those who have experienced batch effects due to calibration drift.

If Casey has repeat calibration failures or notices trends indicating process drift, they’re instructed to alert one of the core lab managers, allowing them to investigate the cause, recalibrate the instrument, or perform necessary maintenance to prevent inaccurate data in subsequent experiments. This helps to maintain the integrity of the data produced by the flow cytometry core, contributing to the overall reliability of the drug development process. However, with their day often filled with multiple tasks and tight deadlines, Casey sometimes forgets to immediately report these issues. While they know the importance of timely communication for the efficiency of the research team, the demands of their busy schedule occasionally lead to delays in notifying the core team. Luckily, their calibration today proceeded without a hitch. 

Midday: Running the Flow Cytometry Experiment

Casey can now begin their experiment. They’re using a multi-laser instrument capable of detecting eight different fluorochromes simultaneously. The instrument settings have been optimized based on the spectral overlap of the chosen fluorochromes, with compensation matrices pre-calculated using single-stained controls.

As each sample runs, data is collected in real-time, generating files that contain millions of events. Each event corresponds to a single cell, with parameters such as forward scatter (FSC), side scatter (SSC), and fluorescence intensities recorded. This raw data is stored in FCS format, a standard file type for flow cytometry data, but the size of these files is significant—especially when considering that today’s experiment involves running over 20 samples, each generating multi-megabyte files.

Early Afternoon: Initial Data Review and Gating

Once the samples are run, Casey transitions to data analysis. The first step is to save their data to their team’s local network share. They click around to find the correct folder in the structure under their name for flow cytometry data and create a new folder entitled with a unique and descriptive experiment name. Casey has learned their lesson more than once that without all this information, it can be difficult to find past data when you need it. More than one experiment has required repeating. They finish cleaning up the instrument and head back to their office to start importing the FCS files into specialized analysis software. Here, they perform initial gating to identify and exclude debris, doublets, and dead cells. Using FSC and SSC parameters, they set gates to define lymphocyte populations, followed by more specific gating for T-cell subsets.

The complexity of this step cannot be understated. Gating is a subjective process that can introduce variability, particularly when multiple analysts are involved. To mitigate this, Casey employs a consistent gating strategy across all samples, carefully documenting their gates and their parameters. They also consider using automated gating algorithms like t-SNE or FlowSOM to explore the data from a high-dimensional perspective, though these methods often require computational resources and expertise beyond what is immediately available to them.

Data Management Challenges: Storage, Analysis, and Integration

By mid-afternoon, Casey faces one of the most persistent challenges in their work: managing all the data. The experiment has generated a substantial amount that now needs to be stored, analyzed, and integrated with other datasets. Some of the key challenges they encounter are:

  1. Data Storage: Each FCS file from the day’s experiment is large, and collectively, they require significant digital storage space. Casey must ensure that this data is securely stored, backed up, and organized in a way that facilitates easy retrieval for future analysis. The lab’s server has limited capacity, and archiving older data is becoming increasingly necessary. 
  2. Data Analysis: The raw data needs extensive analysis to extract meaningful results. This involves manual gating and complex statistical analysis to compare cell populations across different samples. Casey is proficient with FlowJo, a popular program for analyzing flow cytometry data, but the software’s limitations in handling large datasets and integrating with other bioinformatics tools is a concern. They often find themselves exporting data to R or Python for more sophisticated statistical analysis, which adds another layer of complexity and time to the workflow.
  3. Data Integration: Casey’s research doesn’t exist in isolation. They need to integrate their flow cytometry data with other datasets they or collaborators have generated, such as RNA-seq results from the same samples. This integration is non-trivial; it involves harmonizing data formats, scaling, and often developing custom scripts to manage the different outputs. The lack of standardized pipelines for integrating flow cytometry data with other omics data types is a bottleneck that limits the pace of their research.
  4. Data Sharing and Collaboration: Collaboration with other researchers, both within and outside their institution, is a critical aspect of Casey’s work. Sharing flow cytometry data can be challenging due to differences in software, data formats, and analysis approaches. To address this, they use shared platforms like Cytobank for collaborative analysis, though the need for consistent data annotations and standardized protocols remains a constant struggle.

Late Afternoon: Final Analysis and Reporting

As the day begins to wrap up, Casey moves into the final stages of data analysis. They apply statistical tests to compare the frequency and phenotype of immune cell populations across different groups of cultured immune cells. Significant findings are plotted using tools like GraphPad Prism, with results showing clear differences in T-cell subsets between cells exposed to various drug candidates. These differences provide valuable insights into how each candidate influences the immune response, guiding further refinement and selection of the most promising therapy.

However, this is just one part of the workflow. Casey must also document their findings in a way that is reproducible and transparent within their electronic lab notebook (ELN). This includes detailing the exact gating strategy used, any deviations from the protocol, and the statistical methods applied. This level of detail is essential for meeting industry standards, facilitating collaboration with other researchers, and ensuring that the work can be independently verified and validated, which is critical for the advancement of potential therapies. As Casey has previously performed this same analysis, they can save some time by copying some of the information from past entries.

Finally, Casey reviews the day’s results with a critical eye, planning the next steps. The data generated today will inform the design of future experiments and lead to more targeted studies on how specific drug candidates affect particular immune cell populations. This iterative process is essential for refining the development of therapies, ensuring that each step builds on solid, data-centric foundations allowing for traceability of work, decision making, and supporting good scientific practices.

End of the Day: Addressing the Data Management Bottleneck

For R&D scientists like Casey, flow cytometry is a powerful tool, but it comes with significant data management challenges. The volume and complexity of the data require sophisticated storage solutions, robust analysis tools, and effective strategies for data integration, reuse, and sharing. As research increasingly relies on high-dimensional data, the need for better data management practices becomes more critical. Casey’s experience highlights the ongoing need for improved tools and workflows that can handle the demands of modern flow cytometry to advance scientific research and make the most of the data generated in the lab.

Curious how leading organizations are streamlining their flow cytometry workflows? Check out our case study to learn how one company transformed their processes with TetraScience.