DiDAX Project at the Yakhini Group

The DiDAX project aims to revolutionize DNA data storage by developing new, cost-effective encoding and decoding algorithms, synthesis techniques, and embedding technologies. Through innovative methods for encoding, DNA synthesis, and cryptographic authentication, DiDAX will enable efficient, low-cost data archiving using both standard and composite DNA. The project also explores in-product information storage through encapsulation and embedding, creating tailored solutions for reliable data retrieval. Additionally, DiDAX is advancing hotolithography-based and composite synthesis approaches to make DNA storage more accessible and scalable, setting a new standard for secure, long-term data storage solutions.
The Yakhini group joined DiDAX in 2023. This site serves to provide access to some of our analysis services as they are developed.

Computational Services

SOLQC: Synthetic Oligo Library Quality Control tool

SOLQC is an innovative tool designed to support the quality control and assessment of synthetic oligo libraries, particularly as these libraries become more prevalent and complex in synthetic biology research. Leveraging NGS (Next-Generation Sequencing) data provided by the user, SOLQC offers fast, in-depth analysis that includes statistical insights into variant distribution, error rates, and their relationships to specific sequence or library characteristics. SOLQC enhances data interpretation, and has the potential to improve data reliability and aid in more accurate scientific inference.

Link to paper

Service 2

Pending...

Research Projects

Efficient DNA-based data storage using shortmer combinatorial encoding

DNA data storage offers compact, durable archival solutions, with composite DNA alphabets enhancing logical density. This paper presents a new combinatorial encoding method, achieving up to a 6.5-fold increase in storage density with near-zero reconstruction error. Using distinct DNA shortmers to create large alphabets, each letter comprises a subset of shortmers, enabling efficient information encoding. We formalize combinatorial schemes, analyze properties like information density and reconstruction probability, and design an end-to-end storage system, integrating 2D error correction. Simulations show 2D Reed-Solomon codes significantly improve reconstruction, confirmed by successful combinatorial synthesis experiments, underscoring the approach’s robustness and potential.

Link to paper

Sequencing Coverage Analysis for Combinatorial DNA-Based Storage Systems

This study introduces a model for determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. It uses a variant of the coupon collector distribution and a Markov Chain representation to characterize the distribution of sequencing reads needed for error-free message reconstruction. Theoretical bounds on decoding probability are developed and validated through simulations. This work provides insights into sequencing coverage, decoding complexity, and error correction. Additionally, a Python package is offered to compute required read coverage, based on code design, message parameters, and a desired confidence level.

Link to paper

Error-Correcting Codes for Combinatorial Composite DNA

DNA data storage is emerging as a potential solution for archival data, with the combinatorial composite DNA synthesis method extending its capacity by using short DNA fragments, or shortmers, as building blocks. However, missing shortmers during reading can cause symbol errors. This paper models these as asymmetric errors and proposes error-correcting code constructions with a lower bound on redundancy, providing an encoder and decoder for this setup. The error model is supported by experimental data and includes a statistical evaluation of error probability based on read depth.

Link to paper