Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors

US9396222B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9396222-B2
Application numberUS-201414531255-A
CountryUS
Kind codeB2
Filing dateNov 3, 2014
Priority dateNov 13, 2006
Publication dateJul 19, 2016
Grant dateJul 19, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein is a method and system for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. In accordance with exemplary embodiments, the generation of metadata indexes about unstructured data can be hardware-accelerated by processing streaming unstructured data through a reconfigurable logic device to generate the metadata about the unstructured data for the index.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for building a metadata index for unstructured data for a plurality of different data sources, the method comprising: receiving streaming unstructured data into a reconfigurable logic device, the streaming unstructured data comprising a plurality of data items for a plurality of different sources, wherein the reconfigurable logic device has a plurality of pipelined firmware application modules deployed thereon; the pipelined firmware application modules analyzing the streaming unstructured data to generate metadata about the streaming unstructured data at hardware processing speeds, the analyzing including detecting whether a term relating to a name is found in any of the data items, the generated metadata comprising data associated with the data item that is indicative of where a data item having the detected term can be located; and generating an index about the streaming unstructured data from the generated metadata, the index for subsequent querying to locate data items of interest based on associations between the metadata and the data items. 2. The method of claim 1 wherein the data items comprise at least two members of the group consisting of (1) a plurality of news reports, (2) a plurality of web pages, (3) a plurality of market analyses, (4) a plurality of emails, (5) a plurality of social network communications, and (6) a plurality of corporate documents. 3. The method of claim 2 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA), the FPGA having the pipelined firmware application modules deployed thereon. 4. The method of claim 3 wherein the analyzing step comprises the pipelined firmware application modules performing a classification operation on the streaming unstructured data to determine classification information for the data items, the generated metadata including the determined classification information; and wherein the index generating step further comprises generating an index about the streaming unstructured data based on the determined classification information. 5. The method of claim 4 wherein the classification operation performing step includes the pipelined firmware application modules generating word counts for the data items, the determined classification information being based on the generated word counts. 6. The method of claim 2 further comprising storing the generated index in a database for subsequent querying. 7. The method of claim 2 further comprising: streaming a plurality of the data items into the reconfigurable logic device from a plurality of remote data sources via a network interface. 8. The method of claim 2 further comprising: performing a lookup using the generated index as part of a reputation analysis operation for an enterprise. 9. The method of claim 2 wherein the detecting comprises: the pipelined firmware application modules identifying a plurality of names that are found within the data items, the index indexing the data items by the found names; wherein the method further comprises: performing a plurality of lookups relating to a plurality of the names using the generated index; and determining a connectedness for a plurality of individuals based on the lookups. 10. The method of claim 9 wherein the at least two members comprise the emails and the social network communications. 11. The method of claim 9 the data items comprise at least three members of the group consisting of (1) a plurality of news reports, (2) a plurality of web pages, (3) a plurality of market analyses, (4) a plurality of emails, (5) a plurality of social network communications, and (6) a plurality of corporate documents. 12. The method of claim 2 wherein the detecting comprises: the pipelined firmware application modules identifying a plurality of names that are found within the data items, the index indexing the data items by the found names; wherein the method further comprises: performing a plurality of lookups relating to a plurality of the names using the generated index; and determining a connectedness for a plurality of organizations based on the lookups. 13. The method of claim 12 wherein the at least two members comprise the emails and the corporate documents. 14. The method of claim 12 the data items comprise at least three members of the group consisting of (1) a plurality of news reports, (2) a plurality of web pages, (3) a plurality of market analyses, (4) a plurality of emails, (5) a plurality of social network communications, and (6) a plurality of corporate documents. 15. The method of claim 2 wherein the detecting comprises: the pipelined firmware application modules identifying a plurality of names that are found within the data items, the index indexing the data items by the found names; wherein the method further comprises: performing a plurality of lookups relating to a plurality of the names using the generated index; and determining a connectedness for a plurality of individuals and organizations based on the lookups. 16. The method of claim 2 further comprising: integrating the index with structured data relating to the name in a structured database. 17. An apparatus for building a metadata index for unstructured data for a plurality of different data sources, the apparatus comprising: a reconfigurable logic device; and a memory; wherein the reconfigurable logic device is configured to receive streaming unstructured data, the streaming unstructured data comprising a plurality of data items for a plurality of different sources, wherein the reconfigurable logic device has a plurality of pipelined firmware application modules deployed thereon; the pipelined firmware application modules configured to perform analysis of the streaming unstructured data to generate metadata about the streaming unstructured data at hardware processing speeds, the analysis including a detection by the pipelined firmware application modules whether a term relating to a name is found in any of the data items, the generated metadata comprising data associated with the data item that is indicative of where a data item having the detected term can be located; and the memory configured to store an index about the streaming unstructured data from the generated metadata, the index for querying to locate data items of interest based on associations between the metadata and the data items. 18. A method for integrating unstructured data for a plurality of different data sources, the method comprising: streaming unstructured data through a field programmable gate array (FPGA), the unstructured data comprising at least two members of the group consisting of (1) a plurality of emails, (2) a plurality of social network communications, (3) a plurality of corporate documents, and (4) a plurality of news reports; the FPGA performing a metadata generation operation on the unstructured data streamed therethrough to thereby generate metadata about the unstructured data; storing the unstructured data in a data store of unstructured data; storing the metadata about the unstructured data in a database of structured data; and determining a connectedness of a plurality of subjects based on an analysis of the stored unstructured data and the stored metadata. 19. The method of claim 18 wherein the metadata includes an identification of where the unstructured data is stored in the data store of unstructured data. 20. The method of claim 19 wherein the performing step further comprises: the

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9396222B2 cover?
Disclosed herein is a method and system for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. In accordance with exemplary embodiments, the generation of metadata indexes about unstructured data can be hardware-accelerated by processing streaming unstructured data through a reconfigurable…
Who is the assignee on this patent?
Ip Reservoir Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30321. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).