AMPHORA2 (like its predecessor
AMPHORA) is a great tool by
Martin Wu and Alexandra Scott that takes you directly from genomic or metagenomic sequence to phylogeny through identification and analysis of conserved, single-copy phylogenetic marker genes. I have found it to be super handy for assessing metagenomic assemblies, both during and after the binning process. There is one problem, however, that may be preventing some people from using it.
|
'Known issues' |
There's note in the AMPHORA2 README files that states:
KNOWN ISSUES
Bioperl
AMPHORA2 has been tested on Bioperl 1.6.1. People have reported problems running AMPHORA2 on Bioperl 1.6.9. For example, the following error has been reported when running Phylotyping.pl:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: parse error: expected ; or ) or ,
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.14.2/Bio/Root/Root.pm:472
STACK: Bio::TreeIO::NewickParser::parse_newick
/usr/local/share/perl/5.14.2/Bio/TreeIO/NewickParser.pm:195
STACK: Bio::TreeIO::newick::next_tree
/usr/local/share/perl/5.14.2/Bio/TreeIO/newick.pm:143
STACK: main::assign_phylotype
/usr/local/Bioinf/AMPHORA2/Scripts/Phylotyping.pl:154
STACK: /usr/local/Bioinf/AMPHORA2/Scripts/Phylotyping.pl:70
Downgrading Bioperl from 1.6.9 to 1.6.1 solves the problem.
The problem lies in the fact that AMPHORA2 uses RAxML to build trees, and the tree files produced by RAxML don't follow the
Newick/New Hampshire or
NHX format. To wit, RAxML puts
edge labels in brackets after the branch length,
e.g.:
((REF-YP_004290029-NC_015216:0.23840847960367719804[I116],REF-YP_004520254-NC_015574:0.18701622612251170286[I117]):0.03268127637536134139[I115],REF-YP_447902-NC_007681:0.32812155226299949407[I118])
|
Newick! |
(Okay, okay, there is
this page that may allow for the format they use, but it certainly isn't generally agreed upon in the bioinformatics community, as evidenced by the fact that...)
The BioPerl module that parses Newick trees (
Bio::TreeIO::NewickParser.pm) only recognizes brackets in the context of NHX format (
i.e. [&&NHX:tag1=value1), and thus fails and bails when it hits these tags. Apparently, in version 1.6.1 of BioPerl, NewickParser.pm would drop these values in a 'bootstrap' tag, and AMPHORA2 takes advantage of this placement (or misplacement as the case may be) to track taxonomy throughout the tree.
Note that in the BioPerl
HOWTO:Trees, it states that
A newly added capability (after 1.5.2) allows you to specify that the internal nodes id encode bootstrap values instead of IDs. Provide the
-internal_node_id => 'bootstrap'
to Bio::TreeIO and it will automatically move the ids over to the bootstrap slot.
This, however, does not solve the AMPHORA2 issue.
The solution suggested by the Wu lab (downgrade to BioPerl 1.6.1) is unsatisfying on several levels, thus, I came up with the following
hack for NewickParser.pm:
} elsif ($state == 4) { # optional NHX tags
if($token =~ /\[\&\&NHX/) {
# careful: this regexp gets rid of all NHX wrapping in one step
$self->_start('nhx_tag');
$token =~ /\[\&\&NHX\:(\S+)\]/;
if ($1) {
# NHX may be empty, presumably at end of file, just before ";"
my @attributes = split ':', $1;
foreach my $attribute (@attributes) {
$attribute =~ s/\s+//;
my($key,$value) = split '=', $attribute;
$self->_start('tag_name');
$self->_chars($key);
$self->_end('tag_name');
$self->_start('tag_value');
$self->_chars($value);
$self->_end('tag_value');
}
}
$self->_end('nhx_tag');
$token = next_token(\$newick, ",);"); #move to , or )
} elsif ($token =~ /\[/) {
# This is a hack to make AMPHORA2 work
if ($token =~ /\[(\S+)\]/) {
$self->_start('bootstrap');
$self->_chars($1);
$self->_end('bootstrap');
}
$token = next_token(\$newick, ",);"); }
$state = 5;
} elsif ($state == 5) { # end node
(I shouldn't have to say this next bit, but I will anyway...)
Adding this code may cause unexpected errors in other use cases! Be careful!
Thank you, Martin, for writing this tool!
|
Pawer to the People! |