[maker-devel] iprscan and ipr_update_gff

Carson Holt carsonhh at gmail.com
Tue May 6 09:54:41 MDT 2014


Actually looking a little closer, it wouldn't even matter if the ID= and
Name= tags were different for the 'gene', because interproscan gives the
results for the transcripts (mRNA) and not the gene. So Dbxref still gets
populated correctly reguardless.

--Carson


On 5/6/14, 9:47 AM, "Carson Holt" <carsonhh at gmail.com> wrote:

>Ok. With the full file I can see what what was causing the message.  It is
>a parsing bug that was happening in a few cases, and I've now fixed it.
>But you can ignore it, because it has no effect on the output.
>
>It would only be an issue if the ID= and Name= tags were different in the
>GFF3 for the gene feature lines (which is never be true for MAKER's
>output).  It was correctly parsing the 'mRNA' Name and ID tags, but was
>sometimes having issue with the Name= tags for the 'gene' lines (but
>because they are redundant with ID= tag, the script still finds what it
>needs to add the Dbxref= tags).
>
>--Carson
>
>
>On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" <kdelmore at zoology.ubc.ca>
>wrote:
>
>>I just printed the first 20000 lines of the gff to send to you because it
>>was too large to send through email. I've included a dropbox link to the
>>full file below. I've also included a link to the final gff with dbx
>>refs;
>>as I mentioned, it does seem to add them even with the error. If I run
>>ipr_update_gff twice, I get the warnings on the first run but not on the
>>second. Does that help diagnose the problem?
>>
>>The only other red flag I've encountered with maker was in including
>>external gff3 from geneid and sgp2. These gff3s failed validation at the
>>website suggested the the README file, with the warning message "cds:
>>non-unique id" for all cds, but maker didn't give me a warning and they
>>seem to be incorporated into the annotation fine.
>>
>>original gff
>>https://www.dropbox.com/s/nimoh605jdk9myx/6.gff
>>
>>final gff
>>https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta
>>
>>Thanks again for getting back to me.
>>
>>> The file you sent was missing the ##FASTA entry and all sequence at the
>>> bottom for example. Is that the way it is in the datastore?
>>>
>>> --Carson
>>>
>>>
>>> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" <kdelmore at zoology.ubc.ca>
>>> wrote:
>>>
>>>>Thanks for your reply. I have not truncated the gff3. I'm using files
>>>> from
>>>>the datastore that were written at the same time so I'm not sure how
>>>>that
>>>>would happen. I split my multifasta before running it through maker and
>>>>have not merged the gff or protein.fasta for iprscan. That wouldn't be
>>>> the
>>>>problem would it?
>>>>
>>>>> You have entries in your interproscan output that aren't in your
>>>>>GFF3.
>>>>>Is
>>>>> your GFF3 file truncated?
>>>>>
>>>>> --Carson
>>>>>
>>>>>
>>>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca"
>>>>> <kdelmore at zoology.ubc.ca>
>>>>> wrote:
>>>>>
>>>>>>Hi, I have a question about the interproscan scripts available with
>>>>>> maker.
>>>>>>
>>>>>>I'm following the recommendations posted by Carson in Aug 2011 to
>>>>>>incorporate results from iprscan. I'm getting quite a few warning
>>>>>> messages
>>>>>>with ipr_update_gff; they're all the same and suggest that there's no
>>>>>>value for $name. When I look through the updated gff, however, the
>>>>>> dbxrefs
>>>>>>have been added. Is this something I should be worried about?
>>>>>>
>>>>>>I'm using iprscan version 5 and actually get some warning messages
>>>>>> there
>>>>>>as well but again, the output looks alright. In addition, some of my
>>>>>>fastas don't get these warnings in iprscan and they still give me the
>>>>>>error with ipr_update_gff so I don't think that's the problem. I'm
>>>>>> using
>>>>>>proteins from UniProt. My commands and errors are below. I've also
>>>>>>attached the first 20000 lines from my initial gff and raw file from
>>>>>>iprscan.
>>>>>>
>>>>>>Thanks, I really appreciate your continued support.
>>>>>>Kira
>>>>>>
>>>>>>###
>>>>>>
>>>>>>commands for interproscan scripts available in maker
>>>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff  > 6.domains.gff
>>>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff
>>>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw
>>>>>> -inplace
>>>>>>
>>>>>>error after last step (just an example, a ton of similar lines):
>>>>>>Use of uninitialized value $name in hash element at
>>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line
>>>>>>15242.
>>>>>>Use of uninitialized value $name in hash element at
>>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line
>>>>>>15353.
>>>>>>Use of uninitialized value $name in hash element at
>>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line
>>>>>>15674.
>>>>>>Use of uninitialized value $name in hash element at
>>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line
>>>>>>15776.
>>>>>>
>>>>>>
>>>>>>###
>>>>>>
>>>>>>commands for interproscan 5
>>>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup
>>>>>>\
>>>>>> >
>>>>>>interpro_6.out 2>&1
>>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml
>>>>>>
>>>>>>error after first step:
>>>>>>04/05/2014 19:22:09:269 25% completed
>>>>>>04/05/2014 21:27:36:305 50% completed
>>>>>>04/05/2014 21:32:34:236 75% completed
>>>>>>04/05/2014 21:38:01:379 90% completed
>>>>>>2014-05-04 21:50:22,761
>>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput
>>>>>>S
>>>>>>te
>>>>>>p:
>>>>>>248]
>>>>>>WARN - At run completion, unable to delete temporary directory
>>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_
>>>>>>1
>>>>>>74
>>>>>>83
>>>>>>7921_l959/jobPIRSF-2.84
>>>>>>2014-05-04 21:50:22,908
>>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput
>>>>>>S
>>>>>>te
>>>>>>p:
>>>>>>253]
>>>>>>WARN - At run completion, unable to delete temporary directory
>>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_
>>>>>>1
>>>>>>74
>>>>>>83
>>>>>>7921_l959
>>>>>>04/05/2014 21:50:23:380 100% done:  InterProScan analyses completed
>>>>>>
>>>>>>error after second step:
>>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml
>>>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0
>>>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode...
>>>>>>2014-05-05 21:04:00,603
>>>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run
>>>>>>completion, unable to delete temporary directory
>>>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140
>>>>>>5
>>>>>>05
>>>>>>_2
>>>>>>10353293_gsjh_______________________________________________
>>>>>>maker-devel mailing list
>>>>>>maker-devel at box290.bluehost.com
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>






More information about the maker-devel mailing list