[maker-devel] spliting genome for annotation

Daniel Lawson lawson at ebi.ac.uk
Thu Jun 27 07:37:10 MDT 2013


Michel,

It is about the size of your scaffolds rather than the whole genome.
Presumably you don't have 1.2 Gb of contiguous sequence. If you have long
scaffolds then the compute time will be constrained by the time taken to
process the largest scaffold.

regards
Dan


On 27 June 2013 14:33, <michel.moser at ips.unibe.ch> wrote:

> Dear Maker-developers
>
> If i understood correctly, in order to increase speed and reduce needed
> resources one can split the genome into chunks and annotate each chunk
> separately.
> (i would really like to use that as i am working with a 1.2 Gbasepair
> draftgenome and cant use MPI on the computing cluster)
> I am a bit worried about how this might affect the annotation as the
> gene-predictor would get trained quite differently for each chunk, right?
> Or is there communication between the chunks using the -base function of
> maker?
>
> Could you maybe name some pros and cons of splitting your genome for the
> annotation with maker?
>
> Thank you very much,
> Michel
>
>
>
>
> ________________________________________
> Von: Moser, Michel (IPS)
> Gesendet: Donnerstag, 27. Juni 2013 15:24
> An: Carson Holt
> Betreff: AW: [maker-devel] start position for some genes results
>
> ________________________________________
> Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag
> von "Carson Holt [carsonhh at gmail.com]
> Gesendet: Mittwoch, 26. Juni 2013 04:02
> An: Jingjing Jin; maker-devel at yandell-lab.org
> Betreff: Re: [maker-devel] start position for some genes results
>
> The point of the failure you are seeing is occurring in the initialization
> stage, before reaching any of the changes that would have been introduced
> by 2.28.  Try running the test data that comes with MAKER, does it fail as
> well?
>
> --Carson
>
>
>
> From: Jingjing Jin <jjin01 at mail.rockefeller.edu<mailto:
> jjin01 at mail.rockefeller.edu>>
> Date: Tuesday, 25 June, 2013 9:53 PM
> To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
> Subject: RE: [maker-devel] start position for some genes results
>
> Yes, this is the real name.
>
> There is also no ":" in the name.
>
> Because I have use the same file for maker.2.27 and have no problem.
>
> I am not sure what is wrong with the new version.
>
> Jingjing
>
>
> ________________________________
> From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
> Sent: Tuesday, June 25, 2013 9:47 PM
> To: Jingjing Jin; maker-devel at yandell-lab.org<mailto:
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] start position for some genes results
>
> Could you check for this sequence in your input genome file for
> "processed_tobacco_genome_sequences_c1", make sure that it is in fact that
> exact name, and there are no ':' characters in the name because they can
> confuse the bioperl fasta indexer.
>
> --Carson
>
>
> From: Jingjing Jin <jjin01 at mail.rockefeller.edu<mailto:
> jjin01 at mail.rockefeller.edu>>
> Date: Tuesday, 25 June, 2013 9:30 PM
> To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
> Subject: RE: [maker-devel] start position for some genes results
>
> Dear Carson,
>
>
> I am so sorry. The problem is still here.
>
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
>
> /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore
>
> To access files for individual sequences use the datastore index:
>
> /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log
>
> STATUS: Now running MAKER...
> WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to
> re-index the fasta.
> stop here: processed_tobacco_genome_sequences_c1
> ERROR: Fasta index error
>  at
> /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm
> line 239.
>         Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)',
> 'HASH(0x4e10810)', 0) called at
> /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm
> line 73
>         Process::MpiTiers::__ANON__() called at
> /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415
>         eval {...} called at
> /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407
>         Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at
> /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm
> line 79
>         Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)')
> called at
> /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm
> line 56
>         Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0,
> 'Process::MpiChunk') called at
> /home/jingjing/software/maker.2.28/maker/bin/./maker line 650
> --> rank=NA, hostname=ChuaServer1
> ERROR: Failed in tier preparation
> WARNING: You must always set a rank before running MpiTiers
> FATAL: argument `seq_id` does not exist in MpiTier object
>
> ________________________________
> From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
> Sent: Tuesday, June 25, 2013 8:55 PM
> To: Jingjing Jin; maker-devel at yandell-lab.org<mailto:
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] start position for some genes results
>
> Delete the mpi_blastdb directory before starting, to make sure all indexes
> get rebuilt.  Also make sure you are not setting TMP= to a network mounted
> location.
>
> --Carson
>
>
> From: Jingjing Jin <jjin01 at mail.rockefeller.edu<mailto:
> jjin01 at mail.rockefeller.edu>>
> Date: Tuesday, 25 June, 2013 8:53 PM
> To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
> Subject: RE: [maker-devel] start position for some genes results
>
> Dear Carson,
>
> When I use the new version of maker, I have another problem like this:
>
> jingjing at ChuaServer1:~/project/$
> /home/jingjing/software/maker.2.28/maker/bin/./maker
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
>
> /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore
>
> To access files for individual sequences use the datastore index:
>
> /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log
>
> STATUS: Now running MAKER...
> WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to
> re-index the fasta.
> stop here: processed_tobacco_genome_sequences_c1
> ERROR: Fasta index error
>
>
> Do you know how to fix this problem about new version?
>
> Thanks!
>
> Jingjing
>
>
>
> ________________________________
> From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
> Sent: Tuesday, June 25, 2013 6:55 PM
> To: Jingjing Jin; maker-devel at yandell-lab.org<mailto:
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] start position for some genes results
>
> What MAKER version are you using?  This should be fixed in the current
> 2.28.  It only happened under a very specific set of circumstances, but I
> remember fixing it. So let me know if you are using 2.28.
>
> --Carson
>
>
>
> From: Jingjing Jin <jjin01 at mail.rockefeller.edu<mailto:
> jjin01 at mail.rockefeller.edu>>
> Date: Tuesday, 25 June, 2013 5:13 PM
> To: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
> Subject: [maker-devel] start position for some genes results
>
> Dear all,
>
> I find some strange things about location for my final result.
>
> Like for some start position of final gene model:
>
> c124062 maker   gene    -1      507     .       -       .
> ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2
>
>
> It start position is -1.
>
> Does someone know why the start position is  -1?
>
> Is there something wrong?
>
> Thanks!
>
> Jingjing
>
>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>



-- 
Ensembl Genomes | VectorBase | i5K insect genome initiative
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130627/6a18c35d/attachment-0003.html>


More information about the maker-devel mailing list