This continues the series of posts meant to help you write concise and well-tested bioinformatic tools.
One common thing to do is to retrieve, set, or update the SAM optional tags on each SAM record. The SAM optional fields (tags) specification lists predefined standard tags, though users can add their own (start with a X, Y, Z, or lower case character). Lets look at how to access and update SAM tags in SamRecord
s.
If your not familiar with scala’s apply
or update
methods, now is a good time to google them (for the lazy). Similarly, if your not familiar with scala Options
(no link for the lazy).
First lets look at how to access the SAM tags; There are a few different ways:
- The
attributes()
method returns aMap
from all tags to their respective values. This is great if we want all of the tags at once, but has the downside of being built every call, so we likely need to cache (store locally) the return value. Furthermore, the values are not typed (they inherit fromAny
) so we will need to cast them, thus knowing their type ahead of time. Not good software engineering. - The
apply()
method can be used to retrieve the typed value of a tag. For examplerecord.apply[String]("RG")
looks up the value of the “RG” tag, returning it as aString
. You can omit the.apply
and simply writerecord[String]("RG")
. The type (String
) can even omitted if the type can be inferred elsewhere (ex.val rg: String = record("RG")
). Note that if there is no “RG” tag present on the record,apply()
will returnnull
. Not great scala when you don’t already know if the tag is present. - The
get()
method is a better way to return the value for a tag, since it will returnSome
when the tag is present,None
otherwise. To look up the value of the “RG” tag we would writerecord.get[String]("RG")
which returns a value of typeOption[String]
. From this, we can do what we like (ex. callforeach
ormap
, or even pattern match). - The
update()
method can be used to set or update the value of a SAM tag. To set the alignment score tag (“AS”), we could writerecord.update("AS", 42)
, or more succinctlyrecord("AS") = 42
. Neat huh?
Lets see it in action:
I am going to skip the description of the SimpleCounter
and NumericCounter
classes, but suffice it to say they are super useful classes for simple counting of objects of any type (SimpleCounter
) and summary statistics when those objects are numeric types (NumericCounter
).
Here’s the output on running the tool on a BAM file I had lying around: