This continues the series of posts meant to help you write concise and well-tested bioinformatic tools.
One common thing to do is to retrieve, set, or update the SAM optional tags on each SAM record. The SAM optional fields (tags) specification lists predefined standard tags, though users can add their own (start with a X, Y, Z, or lower case character). Lets look at how to access and update SAM tags in
If your not familiar with scala’s
update methods, now is a good time to google them (for the lazy). Similarly, if your not familiar with scala
Options (no link for the lazy).
First lets look at how to access the SAM tags; There are a few different ways:
attributes()method returns a
Mapfrom all tags to their respective values. This is great if we want all of the tags at once, but has the downside of being built every call, so we likely need to cache (store locally) the return value. Furthermore, the values are not typed (they inherit from
Any) so we will need to cast them, thus knowing their type ahead of time. Not good software engineering.
apply()method can be used to retrieve the typed value of a tag. For example
record.apply[String]("RG")looks up the value of the “RG” tag, returning it as a
String. You can omit the
.applyand simply write
record[String]("RG"). The type (
String) can even omitted if the type can be inferred elsewhere (ex.
val rg: String = record("RG")). Note that if there is no “RG” tag present on the record,
null. Not great scala when you don’t already know if the tag is present.
get()method is a better way to return the value for a tag, since it will return
Somewhen the tag is present,
Noneotherwise. To look up the value of the “RG” tag we would write
record.get[String]("RG")which returns a value of type
Option[String]. From this, we can do what we like (ex. call
map, or even pattern match).
update()method can be used to set or update the value of a SAM tag. To set the alignment score tag (“AS”), we could write
record.update("AS", 42), or more succinctly
record("AS") = 42. Neat huh?
Lets see it in action:
I am going to skip the description of the
NumericCounter classes, but suffice it to say they are super useful classes for simple counting of objects of any type (
SimpleCounter) and summary statistics when those objects are numeric types (
Here’s the output on running the tool on a BAM file I had lying around: