The DemuxFastqs tool in fgbio is a flexible tool for sample demultiplexing. It takes FASTQs as input, one per sub-read (ex. index read, read one, read two, fragment). You tell it which bases are part of the sample barcode, molecular barcode, and template, and which should be ignored (see Read Structures). Some sequencers (ex. MiSeq) … Read More
Bioinformatic Tool Series: SAM to FASTQ with UMIs and Barcodes
This continues the series of posts meant to help you write concise and well-tested bioinformatic tools. Today I’ll briefly describe a tool named SamToFastq that converts a SAM (or BAM) to FASTQ, when the SAM has bases and qualities stored in auxiliary tags that need to be included in the FASTQs bases and qualities. This … Read More
Bioinformatics Tools Series: Logging and Progress Logging
This continues the series of posts meant to help you write concise and well-tested bioinformatic tools. We previously wrote a tool to ensure that the NM/UQ/MD SAM tags on each read are accurate. Lets add some logging information to the tool, to print out the input and output SAM or BAM file paths, and report some progress … Read More
Bioinformatics Tools Series: Writing a SAM or BAM file
This continues the series of posts meant to help you write concise and well-tested bioinformatic tools. Lets write a tool that reads a BAM file, does some modification to the records, then writes a BAM file. Lets ensure that the NM/UQ/MD SAM tags on each read are accurate, which is important if aligners aren’t well-behaved … Read More
Bioinformatics Tools Series: Reading a SAM or BAM file
This continues the series of posts meant to help you write concise and well-tested bioinformatic tools. Lets write a tool that reads a SAM or a BAM file. Since just reading it is boring, lets count the number of fragment and paired end reads respectively. Here’s the tool: Lets examine it in a little more … Read More
Bioinformatics Tools Series: SAM Optional Tags
This continues the series of posts meant to help you write concise and well-tested bioinformatic tools. One common thing to do is to retrieve, set, or update the SAM optional tags on each SAM record. The SAM optional fields (tags) specification lists predefined standard tags, though users can add their own (start with a X, Y, … Read More
Building Concise and Well-Tested Bioinformatic Tools
This is the start of a series of posts introducing various ways of implementing cohesive, concise, and well-tested bioinformatic tools in scala using many of the APIs found in fgbio (see the latest scaladoc, list of tools, and list of metrics). Each post will be build on each other to help familiarize folks with how I build simple … Read More
Sequencing Error-reduction through Consensus Calling Reads from the same Source Molecule
Dramatic sequencing error-reduction can be achieved through calling consensus reads that observe the same source molecule. This is extremely important for low-frequency variant detection in somatic samples, as well as genotyping STRs (short-tandem repeats) where high rates of “stutter” can be overcome. I will review the method behind calling consensus reads. There are two … Read More
Single Strand UMI Somatic Variant Calling
Somatic tumor-only variant calling can be improved by incorporating a UMI per-strand on the source molecule. You can also perform Duplex Sequencing (see Kennedy et al 2014) but I will go into that in a later post. Below I will detail the process as well as give you a command line tool that automates all … Read More
Sample Demultiplexing an Illumina Sequencing Run
Sample demultiplexing an Illumina sequencing run can be a real pain if you’re the one having to do it. Typically you would just let the instrument do it for you, or perhaps even use Illumina’s bcl2fastq, but sometimes you don’t have that choice. You may only have access to the run folder, loathe making FASTQ … Read More