[GET-dev] GET-Evidence "metadata only"

Tom Clegg tom at clinicalfuture.com
Fri Aug 17 15:33:48 EDT 2012


On Fri, Aug 17, 2012 at 11:50 AM, Madeleine Ball <mpball at gmail.com> wrote:
> I'm guessing this was added so you could do "one chromosome at a time" runs,
> to improve efficiency. (Not really sure how it works -- how are the runs
> recombined? Maybe recombine metadata as well?) I'm wondering if we think
> separated-by-chromosome runs are worth it, given the added complication it
> makes for the code? What do you think?

That's exactly right.  The rationale behind processing each chromosome
independently and then merging the results is that it's an easy way to
take advantage of multi-core hardware and reduce turnaround time.

Currently the "merge" step just concatenates the get-evidence.json and
get-ev_genes.json files.

If we merge the per-chromosome metadata, we can eliminate the separate
nsSNP-unaware --metadata-only step.  It will be more complicated than
"cat" but not by much.

It does make a big difference to processing time, so I think it's
worthwhile to fix/maintain chromosome-at-a-time mode.

Thanks,
Tom




More information about the Arvados mailing list