This continues the series of posts meant to help you write concise and well-tested bioinformatic tools.

One common thing to do is to retrieve, set, or update the SAM optional tags on each SAM record.  The SAM optional fields (tags) specification lists predefined standard tags, though users can add their own (start with a X, Y, Z, or lower case character).  Lets look at how to access and update SAM tags in SamRecords.

If your not familiar with scala’s apply or update methods, now is a good time to google them (for the lazy). Similarly, if your not familiar with scala Options (no link for the lazy).

First lets look at how to access the SAM tags; There are a few different ways:

  1. The attributes() method returns a Map from all tags to their respective values. This is great if we want all of the tags at once, but has the downside of being built every call, so we likely need to cache (store locally) the return value. Furthermore, the values are not typed (they inherit from Any) so we will need to cast them, thus knowing their type ahead of time. Not good software engineering.
  2. The apply() method can be used to retrieve the typed value of a tag. For example record.apply[String]("RG") looks up the value of the “RG” tag, returning it as a String. You can omit the .apply and simply write record[String]("RG"). The type (String) can even omitted if the type can be inferred elsewhere (ex. val rg: String = record("RG")).  Note that if there is no “RG” tag present on the record, apply() will return null. Not great scala when you don’t already know if the tag is present.
  3. The get() method is a better way to return the value for a tag, since it will return Some when the tag is present, None otherwise. To look up the value of the “RG” tag we would write record.get[String]("RG") which returns a value of type Option[String]. From this, we can do what we like (ex. call foreach or map, or even pattern match).
  4. The update() method can be used to set or update the value of a SAM tag. To set the alignment score tag (“AS”), we could write record.update("AS", 42), or more succinctly record("AS") = 42. Neat huh?

Lets see it in action:

I am going to skip the description of the SimpleCounter and NumericCounter classes, but suffice it to say they are super useful classes for simple counting of objects of any type (SimpleCounter) and summary statistics when those objects are numeric types (NumericCounter).

Here’s the output on running the tool on a BAM file I had lying around:

Leave a Reply

Your email address will not be published. Required fields are marked *