--- title: "Manuscript data availabiliy" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Manuscript data availabiliy} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: refs.bib link-citations: yes --- ## Datasets Two datasets were used to produce the figure in the paper: 1. snmC-seq2 scWGBS data [@luo2018a] for single-cell benchmarks. 2. bulk WGBS from a project in our lab Both datasets were aligned with BISCUIT [@zhou2024]. To produce Bismark BEDgraph files [@krueger2011], we used a python script to convert the beta and coverage value columns to percentage methylation, unmethylated reads and methylated read columns. ```py import sys import gzip bedfile = sys.argv[1] def bed2cov(line): chr, start, end, beta, cov = line.split('\t') percent, meth, unmeth = convert(float(beta), int(cov)) return '\t'.join([chr, start, start, str(percent), str(meth), str(unmeth)]) def convert(beta, cov): percent = round(beta * 100) meth = round(beta * cov) unmeth = cov - meth return [percent, meth, unmeth] with gzip.open(bedfile, 'rt') as bed: for line in bed: print(bed2cov(line)) ``` Using GNU parallel: ```sh parallel python bed2cov.py /{} '>' /{/.} ::: biscuit_path/*.bed.gz ``` Then using the `iscream.paper` package at we ran the benchmarks and produced the figures. ## References