University of Maryland Mike P. Cummings  
Center for Bioinformatics and Computational Biology
HomeResearchPublicationsPersonnel

The Lattice Project
About Lattice
Applications
Create Account
Message Boards
Profile Zone
Questions & Answers
Research Projects
Rules and Policies
Statistics
Teams
Top Computers
Top Participants
Top Teams
Your Account


Research Projects

Over the past few years, we have invited faculty, postdocs, graduate students, and others at the University of Maryland to use The Lattice Project for their research projects.

Working with various researchers, helping them organize and submit their jobs, and listening to their feedback has constantly helped us improve the system and has shown us where more work is needed. Taken together as a whole, the body of projects we have supported is extremely diverse. The Lattice Project is cited in a number of publications that have come out of these studies. Here we provide general information about the types of analyses that have been run on Lattice, the applications used in those analyses, and specific projects. For more information, visit our main research page.


Phylogenetic Analysis - GARLI

The Cummings Laboratory and others are using GARLI to infer evolutionary relationships based on DNA sequence data. more information to come...

Protein Sequence Comparison - HMMPfam

hmmpfam is part of the HMMER package. The HMMER package uses profile hidden Markov models (HMMs) to characterize regions of similar amino-acid sequence in protein families, groups of proteins with similar function found in related organisms. The hmmpfam program searches the protein sequences of proteins with unknown function against a carefully curated set of HMM models, called Pfam, from well-understood protein families. Protein sequences are assigned to one or more protein families on the basis of a statistically significant match to a Pfam HMM.

HMMPfam and RMIDb:

The Edwards lab provides the Rapid Microorganism Identification Database (RMIDb - www.RMIDb.org), a freely available web-resource and database for the identification of bacteria and viruses using mass spectrometry. The RMIDb searches protein sequences from all of the major protein sequence repositories, plus computational protein sequence predictions from sequenced bacterial genomes, for mass matches with experimental masses from mass spectra. Protein sequences are carefully categorized according to strain, species, and other taxonomic groupings, and according to protein function, cellular location, and biological process using the Pfam assignments computed by hmmpfam and their associated gene ontology (GO) classifications. The functional classification of protein sequences must be recomputed using hmmpfam because each of the sources of protein sequence uses different, sometimes conflicting, criteria for Pfam assignment, or provides no assignment at all. Functional classification of protein sequences makes it possible to analyze only the most likely to be observed proteins for mass matches, which decreases search time and increases the statistical significance of species identifications.

HMMPfam for RMIDb on BOINC:

The Edwards laboratory is using the HMMPfam service to compute Pfam assignments for all bacterial, plasmid, and virus protein sequences from Swiss-Prot, TrEMBL, GenBank, RefSeq, and TIGR's CMR, plus an inclusive set of all plausible Glimmer predictions from RefSeq bacterial genomes. These protein sequences, and their Pfam assignments, are used in RMIDb. The HMMPfam service is also being used as a model for 'data-heavy' bioinformatics applications on the Lattice Grid infrastructure, a collaboration between the Cummings and Edwards laboratories.

Conservation Reserve Network Design - MARXAN

MARXAN is a decision support system for the design of conservation reserve networks. It is useful for selecting a reserve system from a large number of potential sites that satisfies a number of ecological, social and economic criteria. For example, certain species or conservation features must be well protected within the reserve system, or the reserve system must not include more than a specified number of sites. The user translates their criteria into representation targets for the conservation features to be protected (i.e. number of populations of each species or percentage of each habitat type to be included in the reserve system), and optionally a cost threshold or desired level of site compactness. MARXAN will produce reserve network solutions that meet these design constraints while simultaneously minimizing the cost of the design (i.e. number of sites required to meet all representation targets).

Biased Data and the Selection of Conservation Reserve Networks:

Joanna Grand, Maile Neel, Michael Cummings (University of Maryland), Taylor Ricketts (World Wildlife Fund), and Tony Rebelo (South African National Biodiversity Institute) are collaborating on a project that uses MARXAN to quantify the impacts of basing the selection of conservation reserve networks on incomplete and biased species distribution data. Most species distribution data are biased in some way (i.e. higher sampling intensity closer to roads or within current reserves); however, they are commonly used to select sites for inclusion in reserve networks because they are considered to be the best data available. The ability of reserve networks to adequately protect biodiversity when sites are selected based on incomplete and biased data is poorly understood.

The first set of analyses compared the efficiency and effectiveness of MARXAN reserve network solutions generated from biased and complete species data. We used data from a virtually exhaustive survey of the Proteaceae family of flowering plants in the Cape Floristic Region of South Africa as our baseline for “complete” data. To produce a sufficient range of solutions for comparison with the complete data solution, we simulated 1000 biased and random incomplete datasets from the full Proteaceae dataset. We then ran MARXAN 1000 times for each dataset. This study design required 1.2002 x 107 separate MARXAN runs which was possible to complete in only a few weeks by running them asynchronously in parallel on the Lattice Grid system.

Currently, we are investigating how well reserve networks protect species when their design is based on detailed species distribution data which are often incomplete and biased, versus coarser environmental data which are easier to acquire and unaffected by the issue of sampling bias. We will compare MARXAN solutions generated with complete, biased, and random species data, to those generated with environmental data (vegetation classes), and combinations of both data types. This analysis will require over 7.6 x 107 separate MARXAN runs and will again rely on the Lattice Grid system to make this enormous amount of processing feasible.

.......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........

University of Maryland     UM Home | Directories | Search | Admissions | Calendar
Copyright © 2008 The Lattice Project
Maintained by Adam Bazinet
Direct questions and comments to Michael Cummings