This is the start of a series of posts introducing various ways of implementing cohesive, concise, and well-tested bioinformatic tools in scala using many of the APIs found in fgbio (see the latest scaladoc, list of tools, and list of metrics).  Each post will be build on each other to help familiarize folks with how I build simple and well-tested bioinformatic tools.  I am going to skip how to setup a project, and a few of the important pre-requisites to get a compiling tool-suite, but refer you to the fgbio repo itself, or the tools sub repo in the bfx-examples project. Lets start!

Lets build a simple tool to print hello world:

Lets examine this line-by-line:

    • line 1: this defines the package, and in this case I am testing this in the bfx-examples repository
    • lines 3: we organize the tools into groups based on function (ex. operates on BAMs, manipulates UMIs); the ClpGroup
    • line 4: the clp annotation is used to annotate which classes are command line tools and so should be exposed on the command line
    • line 5: all tools should extend this class; the execute() method is where all tools should put their implementation
    • line 7-9: the @clp annotation specifies a textual description that will be shown on the command line as well as the functional group to which the tool belongs
    • line 8: the description can be a multi-line string, with basic support markdown (tables are hard), but it needs to be static
    • line 9: the singleton object that specifies the functional group to which this tool belongs
    • line 11: the class definition; it just extends BfxExamplesTool
    • line 12: the implementation of the execute() method, where we simply print “Hello World!”

I have purposely omitted a number of things, including but not limited to:

  • how to specify command line options
  • the convention to validating command line options

For a brief preview, take a look at a tool to print your name:

If you’re impatient, take a look at the RemoveSamTags in fgbio, to see a very readable purpose-built bioinformatics tool.

Leave a Reply

Your email address will not be published. Required fields are marked *