Computational Services

SOLQC: Synthetic Oligo Library Quality Control tool

SOLQC is an innovative tool designed to support the quality control and assessment of synthetic oligo libraries, particularly as these libraries become more prevalent and complex in synthetic biology research. Leveraging NGS (Next-Generation Sequencing) data provided by the user, SOLQC offers fast, in-depth analysis that includes statistical insights into variant distribution, error rates, and their relationships to specific sequence or library characteristics. SOLQC enhances data interpretation, and has the potential to improve data reliability and aid in more accurate scientific inference.

Service 2

Pending...

Research Projects

Efficient DNA-based data storage using shortmer combinatorial encoding

DNA data storage offers compact, durable archival solutions, with composite DNA alphabets enhancing logical density. This paper presents a new combinatorial encoding method, achieving up to a 6.5-fold increase in storage density with near-zero reconstruction error. Using distinct DNA shortmers to create large alphabets, each letter comprises a subset of shortmers, enabling efficient information encoding. We formalize combinatorial schemes, analyze properties like information density and reconstruction probability, and design an end-to-end storage system, integrating 2D error correction. Simulations show 2D Reed-Solomon codes significantly improve reconstruction, confirmed by successful combinatorial synthesis experiments, underscoring the approach’s robustness and potential.

Sequencing Coverage Analysis for Combinatorial DNA-Based Storage Systems

This study introduces a model for determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. It uses a variant of the coupon collector distribution and a Markov Chain representation to characterize the distribution of sequencing reads needed for error-free message reconstruction. Theoretical bounds on decoding probability are developed and validated through simulations. This work provides insights into sequencing coverage, decoding complexity, and error correction. Additionally, a Python package is offered to compute required read coverage, based on code design, message parameters, and a desired confidence level.

Error-Correcting Codes for Combinatorial Composite DNA

DNA data storage is emerging as a potential solution for archival data, with the combinatorial composite DNA synthesis method extending its capacity by using short DNA fragments, or shortmers, as building blocks. However, missing shortmers during reading can cause symbol errors. This paper models these as asymmetric errors and proposes error-correcting code constructions with a lower bound on redundancy, providing an encoder and decoder for this setup. The error model is supported by experimental data and includes a statistical evaluation of error probability based on read depth.