Abstract
Flow cytometry is a cornerstone of biomedical research and clinical diagnostics, yet cross-site analysis remains difficult due to variations in instrumentation and protocols. Although incorporating instrument identifiers can aid data integration, such methods often lack generalizability when extended to previously unseen devices or datasets. In this study, we propose a data-driven representation learning strategy that learns compact instrument-specific embeddings within a prob-abilistic framework, enabling robust cross-site data integration. Our results demonstrate that the approach achieves high classification accuracy, effectively merges data from multiple platforms, and preserves the inherent biological variation of flow cytometry measurements.