Welcome to the Yandell Lab located in the Eccles Institute of Human Genetics on the Campus of the University of Utah's Health Sciences Center.

Current Research in the Yandell Lab:

Sequenced genomes contain a treasure trove of information about how genes function and evolve. Getting at this information, however, is challenging and requires novel approaches that combine computer science and experimental molecular biology. My lab works at the intersection of both domains, and research in our group can be summarized as follows: generate hypotheses concerning gene function and evolution by computational means, and then working with our collaborators we test these hypotheses at the bench. This is easier said than done, as serious barriers still exist to using sequenced genomes and their annotations as starting points for experimental work. Some of these barriers lie in the computational domain, others in the experimental. Though challenging, overcoming these barriers offers exciting training opportunities in both computer science and molecular genetics, especially for those seeking a future at the intersection of both fields. Ongoing projects in the lab are centered on genome annotation and comparative genomics; exploring the relationships between sequence variation and human disease; and applications of metagenomics understanding infectious disease.

More About Research Interests...

Selected Recent Publications:

Specialized insulin is used for chemical warfare by fish-hunting cone snails

Safavi-Hemami H Gajewiak J Karanth S Robinson SD Ueberheide B Douglass AD Schlegel A Imperial JS Watkins M Bandyopadhyay PK Yandell M Li Q Purcell AW Norton RS Ellgaard L Olivera BM

Proc Natl Acad Sci U S A. 2015 Feb 10;112(6):1743-8

DisAp-dependent striated fiber elongation is required to organize ciliary arrays

Galati DF Bonney S Kronenberg Z Clarissa C Yandell M Elde NC Jerka-Dziadosz M Giddings TH Frankel J Pearson CG

J Cell Biol. 2014 Dec 22;207(6):705-15

Transposable element islands facilitate adaptation to novel environments in an invasive species

Schrader L Kim JW Ence D Zimin A Klein A Wyschetzki K Weichselgartner T Kemena C Stökl J Schultner E Wurm Y Smith CD Yandell M Heinze J Gadau J Oettler J

Nat Commun. 2014 Dec 16;5:5495

Genome Annotation and Curation Using MAKER and MAKER-P

Campbell MS Holt C Moore B Yandell M

Curr Protoc Bioinformatics. 2014 Dec 12;48:4.11.1-4.11.39

Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes

Law M Childs KL Campbell MS Stein JC Olson AJ Holt C Panchy N Lei J Jiao D Andorf CM Lawrence CJ Ware D Shiu SH Sun Y Jiang N Yandell M

Plant Physiol. 2015 Jan;167(1):25-39

Identifying rare variants for genetic risk through a combined pedigree and phenotype approach: application to suicide and asthma

Darlington TM Pimentel R Smith K Bakian AV Jerominski L Cardon J Camp NJ Callor WB Grey T Singleton M Yandell M Renshaw PF Yurgelun-Todd DA Gray D Coon H

Transl Psychiatry. 2014 Oct 21;4:e471

Transiently active Wnt/β-catenin signaling is not required but must be silenced for stem cell function during muscle regeneration

Murphy MM Keefe AC Lawson JA Flygare SD Yandell M Kardon G

Stem Cell Reports. 2014 Sep 9;3(3):475-88

Gibbon genome and the fast karyotype evolution of small apes

Carbone L Harris RA Gnerre S Veeramah KR Lorente-Galdos B Huddleston J Meyer TJ Herrero J Roos C Aken B Anaclerio F Archidiacono N Baker C Barrell D Batzer MA Beal K Blancher A Bohrson CL Brameier M Campbell MS Capozzi O Casola C Chiatante G Cree A Damert A de Jong PJ Dumas L Fernandez-Callejo M Flicek P Fuchs NV Gut I Gut M Hahn MW Hernandez-Rodriguez J Hillier LW Hubley R Ianc B Izsvák Z Jablonski NG Johnstone LM Karimpour-Fard A Konkel MK Kostka D Lazar NH Lee SL Lewis LR Liu Y Locke DP Mallick S Mendez FL Muffato M Nazareth LV Nevonen KA O'Bleness M Ochis C Odom DT Pollard KS Quilez J Reich D Rocchi M Schumann GG Searle S Sikela JM Skollar G Smit A Sonmez K ten Hallers B Terhune E Thomas GW Ullmer B Ventura M Walker JA Wall JD Walter L Ward MC Wheelan SJ Whelan CW White S Wilhelm LJ Woerner AE Yandell M Zhu B Hammer MF Marques-Bonet T Eichler EE Fulton L Fronick C Muzny DM Warren WC Worley KC Rogers J Wilson RK Gibbs RA

Nature. 2014 Sep 11;513(7517):195-201

Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation

Wegrzyn JL Liechty JD Stevens KA Wu LS Loopstra CA Vasquez-Gross HA Dougherty WM Lin BY Zieve JJ Martínez-García PJ Holt C Yandell M Zimin AV Yorke JA Crepeau MW Puiu D Salzberg SL Dejong PJ Mockaitis K Main D Langley CH Neale DB

Genetics. 2014 Mar;196(3):891-909

Software:

WHAM

WHole-genome Alignment Metrics (WHAM) is a structural variant (SV) caller that integrates several sources of mapping information to identify SVs. WHAM classifies SVs using a flexible and extendable machine-learning algorithm (random forest). WHAM is not only accurate at identifying SVs, but its association test can identify shared SVs enriched in a cohort of diseased individuals compared to a background of healthy individuals.

pVAAST

High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Pedigree-VAAST (pVAAST) is a disease-gene identification tool designed for high-throughput sequence data in pedigrees.

PHEVOR

Phevor integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles. Phevor works by combining knowledge resident in multiple biomedical ontologies with the outputs of variant prioritization tools. It does so using an algorithm that propagates information across and between ontologies. This process enables Phevor to accurately reprioritize potentially damaging alleles identified by variant prioritization tools in light of gene function, disease, and phenotype knowledge.

GPAT

The application of population genomics to non-model organisms is greatly facilitated by the low cost of next generation sequencing (NGS). Barriers, however, exist for using NGS data for population level analyses. Traditional population genetic metrics, such as Fst, are not robust to the genotyping errors inherent in noisy NGS data. Additionally, many older software tools were never designed to handle the volume of data produced by NGS pipelines. To overcome these limitations we have developed a flexible software library designed specifically for large and noisy NGS datasets. The Genotype Phenotype Association Toolkit (GPAT) implements both traditional and novel population genetic methods in a single user-friendly framework. GPAT consists of a suite of compiled tools and a Perl API that programmers can use to develop new applications. To date GPAT has been used successfully to identity genotype-phenotype associations in several real-world datasets including: domestic pigeons, Pox virus and pine rust fungus. GPAT is open source and freely available for academic use.

GPA++ is a C++ extension of The Genotype Phenotype Association Toolkit. The perl implementation of GPA has more bells and whistles than GPA++, but lacks speed.

VAAST

VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds upon existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood-framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and non-coding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology.

MAKER

MAKER is a portable and easily configurable genome annotation pipeline. It's purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER is also easily trainable: outputs of preliminary runs can be used to automatically retrain its gene prediction algorithm, producing higher quality gene-models on seusequent runs. MAKER's inputs are minimal and its ouputs can be directly loaded into a GMOD database. They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER should prove especially useful for emerging model organism projects with minimal bioinformatics expertise and computer resources.

Features

mRNAseq data in your annotations (use output from programs like TopHat, CuffLinks, and others in MAKER)...
Annotate Eukaryotes and Prokayotes...
Pass through any kind of evidence from any source via GFF3 pass-through...
Easy single command line dump of MAKER output to GMOD tools like JBrowse and Chado database...
Same directory parallelization - just start MAKER ontop of itself, and it will share the workload with the new process...
MPI support - for parallelization on high performance computer clusters...
Auto recovery - start, stop, and restart MAKER at any time; MAKER will always just pick up where it left off...

More Software...

Yandell Lab

Department of Human Genetics - University of Utah