[GET-dev] GET-Evidence "metadata only"

Fri Aug 17 11:50:37 EDT 2012

Hi Tom,

I was looking at adding new types of metadata to GET-Evidence's server, and
I see you created the option to run genome_analyzer.py with "metadata only"
(and not run through other parts of the analysis pipeline). ("if
options.metadata_only:" in
https://github.com/tomclegg/get-evidence/commit/cee0dc413a02ac2020a9197742f76b95cdbacc18
 )

I was thinking it would be nice to add metadata regarding "number of
variants predicted to cause coding changes" (i.e. nonsynonymous SNPs),
which would mean the data has to pass through predict_nonsynonymous
(metadata could either be recorded by that module, or if get_metadata is
run after predict_nonsynonymous then it could detect amino_acid change
predictions added to the output of predict_nonsynonymous).

I think there may be other cases where we'd like modules in the genome
processing to be recording and returning metadata as they run... a
"metadata only" run skipping those modules really limits the types of
metadata we can collect.

I'm guessing this was added so you could do "one chromosome at a time"
runs, to improve efficiency. (Not really sure how it works -- how are the
runs recombined? Maybe recombine metadata as well?) I'm wondering if we
think separated-by-chromosome runs are worth it, given the added
complication it makes for the code? What do you think?

Thanks!

Madeleine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20120817/f1340fd7/attachment.html>