This continues the series of posts meant to help you write concise and well-tested bioinformatic tools.

We previously wrote a tool to ensure that the NM/UQ/MD SAM tags on each read are accurate.  Lets add some logging information to the tool, to print out the input and output SAM or BAM file paths, and report some progress as we progress through the input file:

Lets examine it in a little more depth, focusing on the differences from the previous example:

  • line 22: we mix in a the LazyLogging trait that provides a logger member variable of type Logger, which allows us to write logging information at various levels (ex. info, debug, error, fatal) and formatted nicely on the command line.
  • line 30: the ProgressLogger class records progress to the logging system when iterating through records using the record() methods.  We specify using the unit parameter that we want a logging statement every million records.
  • lines 34-35: we log the file paths of the input and output SAM or BAMs.
  • line 39: the record is recorded with theProgressLogger.
  • line 41: the total number of records read in is logged.

Lets look at the logging to the terminal. Notice that the writer logs progress while writing, whereas we need to manually log progress when reading.

So only in a few extra lines of code, we can log some useful information to the tool user about the input and outputs, as well as record progress reading the input.  Nice huh?

Leave a Reply

Your email address will not be published. Required fields are marked *