[maker-devel] maker_functional_gff Error
Carson Holt
carsonhh at gmail.com
Mon Apr 22 12:50:27 CDT 2019
This “WARNING: No mapping available for ThuMac01937-RA” means you are running on a file that already has been renamed. The file will have names like maker-SDFGDG-gene-0.1-mRNA-1 for example, and it’s finding the name ThuMac01937-RA, which is not in the first column of the map file. So it throws a warning.
The second one —> Can't use string ("") as a HASH ref while "strict refs" in use at /root/maker/bin/maker_functional_gff line 55, <$IN> line 3.
You likely have a trucated line in the GFF3. It’s missing an ID= tag. This can sometimes happen if writing to network mounted (NFS) file systems because of an ansyncrounous IO error. NFS file systems have a performance enhancement where they return SUCCESS on IO operations even and then complete the IO operation later in the background. This improves speed by letting the program advance by not blocking for the IO operation, but it reduces reliability because if the later operation is not really successful, it can’t go back and tell the program “never mind it failed.” The result is a silent truncation of data. Not super common, but not all that rare either depending on IO load (i.e. heavy MPI with lots of writes). Find the line that’s truncated, then rerun just that contig before building the merged gff3 for everything.
—Carson
> On Apr 18, 2019, at 3:23 AM, Paul Sheridan <paul at tupac.bio> wrote:
>
> Dear MAKER Team,
>
> I am running MAKER 2.31.10 a 32 core instance. I followed the Post Processing of Annotations steps as described in the MAKER Tutorial for GMOD Online Training 2014 as best I could, but I get an error when I run maker_functional_gff. The commands in the order of execution and relevant output are shown below.
>
> Where did I do wrong?
>
> # run blastp command
> blastp -query genome.all.maker.proteins.fasta -db uniprot_sprot.fasta -num_threads 32 -evalue 1e-6 -max_hsps 1 -max_target_seqs 1 -outfmt 6 -out output.blastp
>
> # run interproscan command
> interproscan.sh -appl pfam -dp -f TSV -goterms -iprlookup -pa -t p -i genome.all.maker.proteins.fasta -o output.iprscan
>
> # create naming table
> maker_map_ids --prefix ThuMac --justify 5 genome.all.gff > genome.all.map
>
> # copy files for safe keeping
> cp genome.all.gff genome.all.renamed.gff
> cp genome.all.noseq.gff genome.all.noseq.renamed.gff
> cp genome.all.maker.proteins.fasta genome.all.maker.proteins.renamed.fasta
> cp genome.all.maker.proteins.aed.0.50.fasta genome.all.maker.proteins.aed.0.50.renamed.fasta
> cp genome.all.maker.unique.proteins.aed.0.50.fasta genome.all.maker.unique.proteins.aed.0.50.renamed.fasta
> cp genome.all.maker.transcripts.fasta genome.all.maker.transcripts.renamed.fasta
> cp genome.all.maker.transcripts.aed.0.50.fasta genome.all.maker.transcripts.aed.0.50.renamed.fasta
> cp output.iprscan output.renamed.iprscan
> cp output.blastp output.renamed.blastp
>
> # replace uninformative MAKER protein/transcript names with useful ones
> map_gff_ids genome.all.map genome.all.renamed.gff
> map_gff_ids genome.all.map genome.all.noseq.renamed.gff
> map_fasta_ids genome.all.map genome.all.maker.proteins.renamed.fasta
> map_fasta_ids genome.all.map genome.all.maker.proteins.aed.0.50.renamed.fasta
> map_fasta_ids genome.all.map genome.all.maker.unique.proteins.aed.0.50.renamed.fasta
> map_fasta_ids genome.all.map genome.all.maker.transcripts.renamed.fasta
> map_fasta_ids genome.all.map genome.all.maker.transcripts.aed.0.50.renamed.fasta
> map_data_ids genome.all.map output.renamed.iprscan
> map_data_ids genome.all.map output.renamed.blastp
>
> # assign annotations
> maker_functional_gff uniprot_sprot.db output.renamed.blastp genome.all.renamed.gff > genome.all.renamed.putative_function.gff
>
> > head output.renamed.blastp
> ThuMac30929-RA P20036 41.791 134 77 1 326 458 113 246 9.51e-28 114
> ThuMac19623-RA P81018 35.714 168 87 2 1 147 1 168 8.40e-33 117
> ThuMac19629-RA Q66I51 68.939 264 79 2 1 263 1 262 1.48e-130 372
> ThuMac19628-RA Q61464 55.172 87 37 1 766 852 382 466 4.42e-25 119
> ThuMac19627-RA P07898 48.276 58 29 1 13 69 1962 2019 3.60e-13 65.9
> ThuMac19626-RA P81018 36.782 174 96 2 21 180 1 174 5.75e-36 127
> ThuMac19624-RA P81018 35.057 174 99 2 21 180 1 174 2.19e-33 120
> ThuMac19625-RA Q28343 32.520 123 43 2 35 117 2123 2245 7.57e-17 78.6
> ThuMac19636-RA Q9QX29 90.909 110 10 0 5 114 458 567 6.45e-65 216
> ThuMac19638-RA Q9QX29 57.391 115 35 3 5 114 703 808 3.06e-28 120
>
> > head output.renamed.iprscan
> ThuMac08407-RA f1e60af0e3add9ce493bd7a78114da1e 631 Pfam PF00520 Ion transport protein 154 413 3.8E-21 T 18-04-2019 IPR005821 Ion transport domain GO:0005216|GO:0006811|GO:0016020|GO:0055085
> ThuMac08407-RA f1e60af0e3add9ce493bd7a78114da1e 631 Pfam PF08412 Ion transport protein N-terminal 109 152 5.1E-19 T 18-04-2019 IPR013621 Ion transport N-terminal Reactome: R-HSA-1296061
> ThuMac08407-RA f1e60af0e3add9ce493bd7a78114da1e 631 Pfam PF00027 Cyclic nucleotide-binding domain 519 601 1.0E-17 T 18-04-2019 IPR000595 Cyclic nucleotide-binding domain
> ThuMac24094-RA f3c3ae9be61177558ac12f745bd0dd8e 414 Pfam PF13765 SPRY-associated domain 235 283 8.9E-23 T 18-04-2019 IPR006574 SPRY-associated
> ThuMac24094-RA f3c3ae9be61177558ac12f745bd0dd8e 414 Pfam PF00643 B-box zinc finger 18 56 5.2E-12 T 18-04-2019 IPR000315 B-box-type zinc finger GO:0008270
> ThuMac24094-RA f3c3ae9be61177558ac12f745bd0dd8e 414 Pfam PF00622 SPRY domain 287 391 2.2E-14 T 18-04-2019 IPR003877 SPRY domain GO:0005515
> ThuMac08369-RA 7aee1da5a47975ab8e43b68bfd1a117c 139 Pfam PF00076 RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain) 22 87 1.6E-15 T 18-04-2019 IPR000504 RNA recognition motif domain GO:0003676
> ThuMac26054-RA 8f4119609312bd6442f8bb094c104231 462 Pfam PF07565 Band 3 cytoplasmic domain 173 443 7.3E-100 T 18-04-2019 IPR013769 Band 3 cytoplasmic domain GO:0006820|GO:0008509|GO:0016021 Reactome: R-HSA-425381
> ThuMac07958-RA d2b749fa573a5e452cadee56090c9588 804 Pfam PF03372 Endonuclease/Exonuclease/phosphatase family 235 535 7.0E-11 T 18-04-2019 IPR005135 Endonuclease/exonuclease/phosphatase
> ThuMac07958-RA d2b749fa573a5e452cadee56090c9588 804 Pfam PF17751 SKICH domain 555 649 9.8E-23 T 18-04-2019 IPR041611 SKICH domain
>
> > map_data_ids genome.all.map output.renamed.iprscan
> WARNING: No mapping available for ThuMac01937-RA
> WARNING: No mapping available for ThuMac02226-RA
> WARNING: No mapping available for ThuMac20730-RA
> WARNING: No mapping available for ThuMac20730-RA
> WARNING: No mapping available for ThuMac14750-RA
> (Thousands of warnings like these were returned)
>
> > maker_functional_gff uniprot_sprot.db output.renamed.blastp genome.all.renamed.gff > genome.all.renamed.putative_function.gff
> Can't use string ("") as a HASH ref while "strict refs" in use at /root/maker/bin/maker_functional_gff line 55, <$IN> line 3.
>
> > head genome.all.renamed.putative_function.gff
> ##gff-version 3
> scf7180000008677_pilon_pilon . contig 1 49996 . . . ID=scf7180000008677_pilon_pilon;Name=scf7180000008677_pilon_pilon
>
> Thanks in Advance,
>
> Paul Sheridan
>
> --
> CSO at Tupac Bio
> Email: paul at tupac.bio
> Homepage: www.paulsheridan.net <http://www.paulsheridan.net/>
> Mobile: +81 80 7889 0859
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190422/bf0cc6bd/attachment.html>
More information about the maker-devel
mailing list