Constructing custom knowledgebases and sequence datasets with publications

US9563741B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9563741-B2
Application numberUS-201414280285-A
CountryUS
Kind codeB2
Filing dateMay 16, 2014
Priority dateMay 16, 2014
Publication dateFeb 7, 2017
Grant dateFeb 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Illustrative embodiments of custom knowledgebases and sequence datasets, as well as related methods, are disclosed. In one illustrative embodiment, one or more computer-readable media may comprise a custom knowledgebase and an associated sequence dataset. The custom knowledgebase may comprise a plurality of assertions that have been automatically extracted from a plurality of publications, where each of the plurality of assertions encodes a relationship between a subject and an object. The sequence dataset may comprise a plurality of called biological sequences, where each of the plurality of called biological sequences is associated with one or more of the plurality of assertions of the custom knowledgebase.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: automatically extracting a plurality of assertions from a plurality of publications, wherein each of the plurality of assertions encodes a relationship between a subject and an object; manually editing the plurality of assertions automatically extracted from the plurality of publications to construct a custom knowledgebase for a particular biological field; and constructing a sequence dataset comprising a plurality of called biological sequences, wherein each of the plurality of called biological sequences is associated with one or more of the plurality of assertions of the custom knowledgebase, wherein constructing the sequence dataset comprises: automatically extracting one or more called biological sequences from the plurality of publications; extracting additional called biological sequences from one or more publicly available databases; grouping the additional called biological sequences with the one or more called biological sequences automatically extracted from the plurality of publications in response to one or more predetermined resemblance criteria being met; and associating each group of called biological sequences with one or more of the plurality of assertions of the custom knowledgebase. 2. The method of claim 1 , wherein manually editing the plurality of assertions automatically extracted from the plurality of publications comprises at least one of (i) selecting a subset of the plurality of assertions automatically extracted from the plurality of publications for inclusion in the custom knowledgebase, (ii) modifying the content of one or more of the plurality of assertions automatically extracted from the plurality of publications for inclusion in the custom knowledgebase, and (iii) creating one or more additional assertions for inclusion in the custom knowledgebase. 3. The method of claim 2 , wherein the manual editing of the plurality of assertions automatically extracted from the plurality of publications is performed by one or more subject matter experts in the particular biological field. 4. The method of claim 3 , wherein automatically extracting the plurality of assertions from the plurality of publications comprises utilizing natural language processing software to derive the plurality of assertions from the text of the plurality of publications. 5. The method of claim 4 , wherein the plurality of publications comprise peer-reviewed articles selected by the subject matter experts. 6. The method of claim 4 , wherein the natural language processing software has been trained by the subject matter experts to recognize relevant assertions in the text of the plurality of publications. 7. The method of claim 4 , wherein each of the plurality of assertions is expressed as a Resource Description Framework (RDF) triple. 8. The method of claim 1 , wherein the plurality of called biological sequences included in the sequence dataset and the associations between the plurality of called biological sequences and the plurality of assertions of the custom knowledgebase are manually edited by the subject matter experts. 9. One or more tangible, non-transitory computer-readable media comprising: a custom knowledgebase comprising a plurality of assertions that have been automatically extracted from a plurality of publications, wherein each of the plurality of assertions encodes a relationship between a subject and an object; a sequence dataset comprising a plurality of called biological sequences, wherein each of the plurality of called biological sequences is associated with one or more of the plurality of assertions of the custom knowledgebase; and a client application configured to: compare a plurality of sample biological sequences to the plurality of called biological sequences of the sequence dataset; and determine, for at least one sample biological sequence that resembles a called biological sequence of the sequence dataset, one or more probable characteristics associated with that sample biological sequence using one or more assertions of the custom knowledgebase that are associated with the called biological sequence that resembles that sample biological sequence, wherein the at least one sample biological sequence is not in the sequence dataset. 10. The one or more tangible, non-transitory computer-readable media of claim 9 , wherein the plurality of assertions automatically extracted from the plurality of publications have also been manually edited by one or more subject matter experts in a biological field of the custom knowledgebase. 11. The one or more tangible, non-transitory computer-readable media of claim 9 , wherein: the plurality of called biological sequences of the sequence dataset comprise at least one of called biological sequences that provide resistance to one or more antibiotics and called biological sequences that mediate regulation of antibiotic resistance; and the plurality of assertions of the custom knowledgebase comprise assertions that encode relationships between the called biological sequences of the sequence dataset and at least one of antibiotic resistance elements and regulatory elements. 12. The one or more tangible, non-transitory computer-readable media of claim 11 , wherein the plurality of assertions of the custom knowledgebase further comprise assertions that encode relationships between antibiotic resistance elements and particular resisted antibiotics. 13. A method comprising: comparing a plurality of sample biological sequences to a plurality of called biological sequences included in a sequence dataset; retrieving, from a custom knowledgebase associated with the sequence dataset, one or more assertions that are associated with a called biological sequence of the sequence dataset that resembles one of the plurality of sample biological sequences, wherein the one of the plurality of sample biological sequences is not in the sequence dataset and wherein the custom knowledgebase comprises a plurality of assertions that have been automatically extracted from a plurality of publications, each of the plurality of assertions encoding a relationship between a subject and an object; and determining one or more probable characteristics associated with the sample biological sequence that resembles the called biological sequence of the sequence dataset using the one or more assertions retrieved from the custom knowledgebase. 14. The method of claim 13 , wherein the plurality of assertions automatically extracted from the plurality of publications have also been manually edited by one or more subject matter experts in a biological field of the custom knowledgebase. 15. The method of claim 13 , further comprising generating the plurality of sample biological sequences using massively parallel sequencing of a metagenomic sample. 16. The method of claim 13 , wherein determining one or more probable characteristics associated with the sample biological sequence comprises determining one or more antibiotics likely to be resisted. 17. The method of claim 16 , further comprising generating a report that comprises a ranked listing of the antibiotics likely to be resisted.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Knowledge engineering; Knowledge acquisition · CPC title

  • Physics · mapped topic

  • G06F19/28Primary

    Physics · mapped topic

  • G16B50/30Primary

    Data warehousing; Computing architectures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9563741B2 cover?
Illustrative embodiments of custom knowledgebases and sequence datasets, as well as related methods, are disclosed. In one illustrative embodiment, one or more computer-readable media may comprise a custom knowledgebase and an associated sequence dataset. The custom knowledgebase may comprise a plurality of assertions that have been automatically extracted from a plurality of publications, wher…
Who is the assignee on this patent?
Battelle Memorial Institute
What technology area does this patent fall under?
Primary CPC classification G06F19/28. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).