OPENSEQ.org
Webserver is back! Please contact us if you notice any strange outputs.
INTRODUCTION
OPENSEQ (short for open sequence) was created with the intent of providing an open platform to share sequence based analysis to the public.
  • Our first project includes constructing a maximum entropy statistical model to describe each protein familiy in the Pfam database. The model captures both conservation and co-evolution patterns in the family. The strength of measured co-evolution is strongly predictive of residue-residue contacts in the 3D structure of the protein. This means that we can use this information to better understand contacts in known structures and make prediction of contacts for protein families that have no structure. (Image on the left shows the ND1 protein of the electron transport chain with predicted contacts in yellow).
  • The work done here is the result of students and computing power of Prof. David Baker's Lab, though we would also like to thank the developers of HHsuite, HMMER and folks that maintain the Pfam, PDB and UniProt databases.
NEWS
May 11, 2014
Update on Complexes
  • We've added support for Jackhmmer on our complexes web-server. You can see the technical details of how exactly we implemented it to do paired-alignment generation in the techincail details of our FAQ. The advantages/disadvantages of using Jackhmmer are described April 04, 2014 (below).

  • Since our publication: a similar, independent and very cool study has recently been posted to bioRxiv by Hopt et al (folks at evfold.org)
    • Sequence co-evolution gives 3D contacts and structures of protein complexes [LINK][PDF]

  • We are working on making all the scripts we used in our recent study human_friendly. All scripts involved in alignment generation + GREMLIN analysis have already been posted in the FAQ section.
May 01, 2014
Complex Prediction!
  • Early version of our paper is out showing we can predict protein-protein (residue-residue) interactions using co-evolution data. [LINK][PDF]
  • We've uploaded all of our results and alignments online for your viewing pleasure. =]
  • We also have a beta webserver that will generate paired alignments for a given protein pair and run the GREMLIN analysis. If you encounter any error or have suggestions, please use our contact form!

CASP11 starts TODAY!!

  • CASP11 is going to be the era of "contact prediction" every group has some kind of contact predictor in their pipeline. To make things easier for folks interested in using GREMLIN results in CASP, casp sequences submitted will automatically be organized in our CASP11 page. Good luck!!
April 04, 2014
Some updates to the submission page:
  • We've added support for Jackhmmer (from HMMER) for multiple sequence alignment generation.
    • Disadvantge: It is slower than HHblit, since Jackhmmer does not use the pre-clustered uniprot database.
    • Advantage: Since Jackhmmer does not require a pre-clustered uniprot DB (which is updated once a year), it can use the latest uniprot release (which is updated once a month).
    • If your favorite gene does not have enough homologous sequences to perform co-evolution analysis, you can try resubmitting the gene every month until it does! =P
  • For those of you interested in using co-evolution based contacts in Rosetta modeling software, restraint files are now provided for each submission. We also provide a realignment and renumbering tool, for those using the restraints for a sequence longer or shorter than the original query.
February 16, 2014
Some exciting new features have been added to the submissions output page! All previous submissions have been updated.
  • We've been working on adding homooligomer support. When you submit a job, contacts coming from other chains will now be highlighted in shades of red. The max hhsearch contact will now be shown instead of the average (being either an intra, inter chain contact, or coming from a different pdb hit.)
  • Each PDB hit now has its own contact map, that you can click on.
  • We ran GREMLIN on all E. coli genes that have at least 1L sequences. I am working on creating a pretty intro page, but in the meantime you can see a sneak preview here: ECOLI
September 14, 2013
  • It's very exciting to see folks submitting jobs to the server (We apologize to those that submitted sequences that are ~1000 amino acids, those can take up to 24 hours to complete...). We've been fixing little bugs as they come up, if you encounter any errors or the output of the results page does not make sense, please write to us!
  • We now include a sequence conservation graphic as depicted by WebLogo, for all our submissions. These can be useful because:
    1. A mutation to a functionally important residue will not be selected for, if it cannot be easily compensated by a co-mutation, and thus will not be observed in a multiple sequence alignment, hence not be captured by a co-evolution analysis.
    2. What may appear to be an highly variable/un-conserved position, (based on a WebLogo representation), may actually be highly conserved and co-evolving with another position [as would be captured in the GREMLIN output].
September 3, 2013
  • We are excited to announce that the paper describing our latest work has been released in PNAS! [LINK][PDF]
  • A simplified version of the webpage is now available at gremlin.bakerlab.org that will only include the pfam analysis and gremlin submission form. This is to prevent information overload for folks accessing the resource for the first time, as we continue to add other resources to openseq.org.
August 23, 2013
  • The online server has been updated to include more options, and to make resubmission process much easier. Options include:
    • The ability to submit either a single sequence or a starting alignment.
    • Control diversity of the alignment, by adjusting number of iterations and e-value.
    • Focus on region of interest, by adjusting the coverage and gap removal filters.
    • We are working on adding the ability to select priors! Right now only the "Vanilla" option works.
  • The FAQ page is now live! Its hard to judge what is common knowledge and what is new to our users. Please help improve this page by submitting questions using our contact form! (Even if its a question to which you already know the answer to, but you feel others might benefit.)
August 5, 2013
  • We are working on setting up an online server for GREMLIN co-evolution analysis. The server is in BETA mode, any suggestions are welcome as we prep for public release!
July 30, 2013
  • Predictions for 2013 should be done. We are keeping the 2012 predictions for archive purposes.
  • We are in the process of uploading alignments used in our calculations. Note: for gremlin runs we removed sites that had > 75% gaps, provided alignment includes these sites.
July 24, 2013
  • We are updating our predictions to reflect new sequences that have been released since 2012. When you click on any of the pfams, you'll see a "2013" tab. The calculations are running and will be uploaded as they come in. Eventually the lists will be replaced with these new calculations.
  • We welcome any suggestions as we prep this webpage for public release.
PUBLICATIONS
  • Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.
    Sergey Ovchinnikov, Hetunandan Kamisetty, and David Baker.
    Elife (2014).
    [LINK][PDF]
  • Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era.
    Hetunandan Kamisetty, Sergey Ovchinnikov, and David Baker.
    Proceedings of the National Academy of Sciences 110, no. 39 (2013): 15674-15679.
    [LINK][PDF]
  • Learning generative models for protein fold families.
    Sivaraman Balakrishnan, Hetunandan Kamisetty, Jaime G. Carbonell, Su-In Lee, and Christopher James Langmead.
    Proteins: Structure, Function, and Bioinformatics 79, no. 4 (2011): 1061-1078.
    [LINK][PDF]