'Known issues' |
There's note in the AMPHORA2 README files that states:
KNOWN ISSUESThe problem lies in the fact that AMPHORA2 uses RAxML to build trees, and the tree files produced by RAxML don't follow the Newick/New Hampshire or NHX format. To wit, RAxML puts edge labels in brackets after the branch length, e.g.:
Bioperl
AMPHORA2 has been tested on Bioperl 1.6.1. People have reported problems running AMPHORA2 on Bioperl 1.6.9. For example, the following error has been reported when running Phylotyping.pl:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: parse error: expected ; or ) or ,
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.14.2/Bio/Root/Root.pm:472
STACK: Bio::TreeIO::NewickParser::parse_newick
/usr/local/share/perl/5.14.2/Bio/TreeIO/NewickParser.pm:195
STACK: Bio::TreeIO::newick::next_tree
/usr/local/share/perl/5.14.2/Bio/TreeIO/newick.pm:143
STACK: main::assign_phylotype
/usr/local/Bioinf/AMPHORA2/Scripts/Phylotyping.pl:154
STACK: /usr/local/Bioinf/AMPHORA2/Scripts/Phylotyping.pl:70
Downgrading Bioperl from 1.6.9 to 1.6.1 solves the problem.
((REF-YP_004290029-NC_015216:0.23840847960367719804[I116],REF-YP_004520254-NC_015574:0.18701622612251170286[I117]):0.03268127637536134139[I115],REF-YP_447902-NC_007681:0.32812155226299949407[I118])
Newick! |
(Okay, okay, there is this page that may allow for the format they use, but it certainly isn't generally agreed upon in the bioinformatics community, as evidenced by the fact that...)
The BioPerl module that parses Newick trees (Bio::TreeIO::NewickParser.pm) only recognizes brackets in the context of NHX format (i.e. [&&NHX:tag1=value1), and thus fails and bails when it hits these tags. Apparently, in version 1.6.1 of BioPerl, NewickParser.pm would drop these values in a 'bootstrap' tag, and AMPHORA2 takes advantage of this placement (or misplacement as the case may be) to track taxonomy throughout the tree.
Note that in the BioPerl HOWTO:Trees, it states that
A newly added capability (after 1.5.2) allows you to specify that the internal nodes id encode bootstrap values instead of IDs. Provide theThis, however, does not solve the AMPHORA2 issue.
-internal_node_id => 'bootstrap'
to Bio::TreeIO and it will automatically move the ids over to the bootstrap slot.
The solution suggested by the Wu lab (downgrade to BioPerl 1.6.1) is unsatisfying on several levels, thus, I came up with the following hack for NewickParser.pm:
} elsif ($state == 4) { # optional NHX tags
if($token =~ /\[\&\&NHX/) {
# careful: this regexp gets rid of all NHX wrapping in one step
$self->_start('nhx_tag');
$token =~ /\[\&\&NHX\:(\S+)\]/;
if ($1) {
# NHX may be empty, presumably at end of file, just before ";"
my @attributes = split ':', $1;
foreach my $attribute (@attributes) {
$attribute =~ s/\s+//;
my($key,$value) = split '=', $attribute;
$self->_start('tag_name');
$self->_chars($key);
$self->_end('tag_name');
$self->_start('tag_value');
$self->_chars($value);
$self->_end('tag_value');
}
}
$self->_end('nhx_tag');
$token = next_token(\$newick, ",);"); #move to , or )
} elsif ($token =~ /\[/) {
# This is a hack to make AMPHORA2 work
if ($token =~ /\[(\S+)\]/) {
$self->_start('bootstrap');
$self->_chars($1);
$self->_end('bootstrap');
}
$token = next_token(\$newick, ",);"); }
$state = 5;
} elsif ($state == 5) { # end node
(I shouldn't have to say this next bit, but I will anyway...)
Adding this code may cause unexpected errors in other use cases! Be careful!
Thank you, Martin, for writing this tool!
Pawer to the People! |
3 comments:
Hey Bill, nice easy fix! I've committed this to the bioperl repo, we'll try to get it into the next release should the testing hold up (so far it works fine).
Here's the commit.
Glad someone found this useful. I shan't comment on the 'correctness' of either the RAxML tree format, how AMPHORA bends it to its will, or how TreeIO handles Newick tree data. I just wanted AMPHORA2 to work. :)
Post a Comment