KB

Bio2RDF

We developed a semantic knowledge base that can be used to answer questions regarding the biomedical domain. The knowledge base is composed of the following resources

1 - Drugbank: a source of drugs and drug targets.

2 - SIDER: a source of drug indications and drug effects.

3 - OMIM: a source of diseases and disease genes.

4 - ChEBI: an ontology for chemicals

5 - DO: an ontology of diseases

Example Questions

1. Which drugs can be used to treat pulmonary hypertension? http://bit.ly/1fDUnwf

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX dct: <http://purl.org/dc/terms/>

SELECT distinct ?drug_name

WHERE {

?s <http://bio2rdf.org/sider_vocabulary:indication> ?o .

?s rdfs:label ?drug_name .

?o rdfs:label ?disease_name .

FILTER regex (?disease_name,"^pulmonary hypertension ","i")

}

Methods

The RDF data were generated using the open source scripts from the Bio2RDF project [1]. In particular, we used the Release 3 candidate scripts from the latest development in Michel Dumontier's github repo [2]. The resulting files were loaded into a pre-compiled Virtuoso 7.1.0 [3]. We used the bulk RDF loader [4] to load these files into their specified graphs. We updated the facet browser by initializing the tables and updating the index [5]. The server is available via its facet interface [6] and sparql endpoint [7].

[1] http://bio2rdf.org

[2] https://github.com/micheldumontier/bio2rdf-scripts

[3] http://sourceforge.net/projects/virtuoso/files/virtuoso/7.1.0/

[4] http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader

[5] http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtFacetBrowserInstallConfig

[6] http://14.63.169.59:8891/fct

[7] http://14.63.169.59:8891/sparql

A summary of the contents of these datasets (copied from the Bio2RDF project) are available here:

http://14.63.169.59:81/kb/omim.html

http://14.63.169.59:81/kb/drugbank.html

http://14.63.169.59:81/kb/sider.html

BioGateway

The shortage of attractive interfaces, for end users (e.g. physicians), to exploit LOD is leaving much of those efforts shaded out. In order to overcome such a problem, the QA community and the LOD community should combine efforts to close the aforementioned gap. Ideally, we expect to come up with an interface in which users could write down questions (in natural language, see examples below) which in turn will be sent a knowledge base (e.g. federated) and return a correct answer.

Disclaimer: This is not a comprehensive list of resources but a list used within the OKBQA hackathon in 2014.

The list below provides a set of LOD resources within the Bio-community:

List of NL questions for those LOD resources:

    1. What are the experiments where the sample description contains diabetes? (Expression Atlas@EBI)

    2. What are the differentially expressed genes where factor is asthma? (Expression Atlas@EBI)

    3. For the genes differentially expressed in asthma, get the gene products associated to a Reactome pathway (Expression Atlas@EBI)

    4. What are the preferred gene name and disease annotations of all human UniProt entries that are known to be involved in a disease? (UniProt)

    5. Find drug-like (but currently not approved) molecules which bind 7TM1 GPCRs with high affinity. (ChEMBL)

    6. Which proteins are involved in the activation of CREB1? (APO)

    7. Which plant proteins can be located in either the nucleus or the endoplasmatic reticulum or any part thereof?(BioGateway)