This hybrid conference is an annual review of cloud computing developments and trends, impact on world economy And society
Day 1 Tuesday, Dec 4 Mountain View
Day 2 Tuesday, Dec 6 Mountain View
Day 3 Wednesday, Dec 7 Virtual
Day 4 Thursday, Dec 8 Virtual
Day 5 Friday, Dec 9 Virtual

Bio4j: A pioneer graph based database for the integration of biological Big Data

Bio4j: A pioneer graph based database for the integration of biological Big Data Pablo Pareja1; Eduardo Pareja-Tobes1, Marina Manrique1, Eduardo Pareja1, Raquel Tobes1 1: Oh no sequences! group. Era7 Bioinformatics, Pza Campoverde 3, Atico. Granada, Spain. ppareja@era7.com The main aims of bio4j open source project are to integrate commonly used biological data resources (Uniprot, Uniref, Genbank, GeneOntology, RefSeq, Interpro...) into a property graph data model and to build corresponding fully cloud-enabled aggregated data distributions. To integrate the current contents (Uniprot, GeneOntology, Uniref(50, 90, 100), and Refseq) we have used Neo4j graph database as the backend, an OS pure-java transactional graph db engine. We have also developed a simple data model (the API github repository can be found here: https://github.com/pablopareja/Bio4jModel), staying as semantically close as possible to the original, which integrates the above sources into a property graph. Bio4j database currently includes more than 500.000.000 relationships and 50.000.000 nodes and keeps growing every day, setting a precedent in bioinformatics Big Data modelled as a graph. We developed an AWS CloudFormation template which leverages the necessary steps for automated cloud deployment. This template creates an EC2 instance from AWS Linux AMI (user-conf instance type) and EBS volumes containing the data and all required libraries. Apart from this AMI, there also are deployment scripts available which would work on any Linux instance with java technology available. There exist several entry points to the database based in both exact and full-text indexes, allowing the user to start traversals in almost any point of the graph. Almost all the attributes of the different entities modelled are stored as node/relationship properties. The only exception are Refseq sequences, which are stored as independent S3 files. These files are easily accessible by their Genome Element ID and can even be directly queried with a specific range of positions. Things are done that way so that the database is not overload with static content which does not involve any interconnectivity. What do you get? - Graph database query capabilities, and neo4j particularly gives huge query language expression power (see http://arxiv.org/abs/1004.1001). - Integration of commonly used biological data sets in a single database. Complex queries can be achieved programmatically in a fairly simple graph traversal URL for the overall project web site: http://www.bio4j.com The particular Open Source License being used: AGPLv3

- by Pablo Pareja Tobes

Bioinformatics researcher/consultant/developer of Oh no sequences! (Era7 Bioinformatics)



Author`s Bio:
Currently working as Bioinformatics consultant/developer/researcher at http://www.ohnosequences.com Amateur musician, traveller, and always eager to learn more about languages, plants... too many things to do and too little time for it!



register today!

Thank you for your interest in Second Annual UP 2011 conference. Please use the form below to register for full access to the conference. If you experience any problems with this form, or it does not render please try to register directly at http://up11.eventbrite.com If you still experience any difficulties, please contact us at info@up-con.com For feature comparison list, please visit this page.

A partial list of organizations who attended UP 2010 Conference

  • Orange/France Telecom
  • Credit Suisse
  • Brookfield Asset Management
  • BlueScope Steel
  • Raytheon
  • Glenn Wells
  • Exxon Mobil
  • Trilogy International Partners
  • General Electric
  • First Command Financial Services
  • Southern California Edison/IT &BI
  • Denovo
  • Karyn Mashima Consulting, LLC
  • PwC
  • Infralogic Technology Resources
  • Datameer
  • Sun Microsystems, Inc.
  • Atos Origin
  • City Of Orlando
  • Blyth, Inc.
  • SunTrust Bank
  • AXIS Capital
  • JPMC
  • Makara
  • IES
  • Lockhead Martin
  • Nextpoint
  • SITA
  • Cisco Systems India (P) Ltd
  • Maxim Integrated Products
  • Ford Motor
  • Knorr-Bremse
  • Xerox
  • Bruce Power
  • Johnson Controls Inc.
  • Bank of America
  • Sodexo
  • Trelleborg
  • Fidelity Institutional Wealth Services
  • Johnson & Johnson
  • UM
  • CA Technologies Inc
  • Navega.com
  • Fidelity Investments
  • Capgemini
  • EMC
  • Ernst & Young
  • Assurant Solutions
  • Boeing
  • BeyondCore, Inc.
  • PureInbox Inc.
  • Nike
  • Immunet Corporation
  • Accenture
  • CSC
  • AEG Resources
  • Lulea University of Technology
  • PricewaterhouseCoopers LLP
  • Verizon
  • Verizon Business
  • Salesforce.com
  • Pfizer Inc
  • ANSYS
  • Brocade Communications Systems, Inc.
  • Lifestreet
  • Banco General
  • Sovereign Sense Consulting
  • Corticon
  • Nimbula
  • USWired Inc.
  • Zetta
  • PayDeg
  • Tyson Foods
  • Kkogyo Chosakai Publishing
  • Technical School Corfu
  • ARSYS INTERNET S.L.
  • RMS, Inc.
  • Rackspace
  • J.P. Morgan Chase
  • Cronos Group
  • GoGrid
  • GSA USA
  • Pareto Networks
  • IMEX Research
  • Systems & Consulting Inc
  • Bits Republic Technologies
  • Tripwire
  • Walt Disney
  • Synmotive BV
  • 05-07 DECEMBER 2011