Method and system for high performance integration, processing and searching of structured and unstructured data

US11449538B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11449538-B2
Application numberUS-201916259326-A
CountryUS
Kind codeB2
Filing dateJan 28, 2019
Priority dateNov 13, 2006
Publication dateSep 20, 2022
Grant dateSep 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are methods and systems for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. In accordance with exemplary embodiments, the generation of feature vectors about unstructured data can be hardware-accelerated by processing streaming unstructured data through a reconfigurable logic device, a graphics processor unit (GPU), or chip multi-processor (CMP) to determine features that can aid clustering of similar data objects.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for low latency and high throughput feature vector extraction, the method comprising: receiving streaming unstructured data into a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), and (3) a chip multi-processor (CMP), the streaming unstructured data comprising a plurality of data objects, wherein the data objects include a plurality of words, and wherein the member has a plurality of parallel processing engines deployed thereon; the parallel processing engines analyzing the data objects while the data objects stream through the member to perform a plurality of feature vector extraction operations on the streaming data objects that determine a plurality of features of the streaming data objects, wherein the determined features include a frequency of words within the data objects; and creating an association that is physically represented in memory between the determined features and the data objects. 2. The method of claim 1 wherein the analyzing step comprises generating a word count for a plurality of the words in the streaming data objects. 3. The method of claim 1 wherein the analyzing step comprises generating histograms with respect to a plurality of the words in the streaming data objects. 4. The method of claim 1 further comprising: performing clustering of the data objects based on the determined features to find clusters of the data objects that share similar features according to clustering criteria. 5. The method of claim 1 wherein the member comprises the reconfigurable logic device. 6. The method of claim 5 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA). 7. The method of claim 6 wherein the FPGA comprises a plurality of FPGAs. 8. The method of claim 7 wherein the parallel processing engines are partitioned across a plurality of the FPGAs. 9. The method of claim 1 wherein the member comprises the GPU. 10. The method of claim 1 wherein the member comprises the CMP. 11. The method of claim 1 further comprising the parallel processing engines creating an index of the streaming data objects based on the determined features. 12. The method of claim 11 wherein the index is stored as structured data in the database, the method further comprising: storing the streaming unstructured data in a data store of unstructured data; receiving a query that is directed toward a combination of structured data and unstructured data; accessing structured data in the database according to the classification index in response to the query to identify a subset of the unstructured data that is to be analyzed against the query; and performing a query-specified data analysis operation on the identified subset of unstructured data to thereby generate data for a response to the query; wherein the accessing step is conducted by a processor; and wherein the step of performing the query-specified data analysis operation is conducted by the member. 13. An apparatus for low latency and high throughput feature vector extraction, the apparatus comprising: a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), and (3) a chip multi-processor (CMP), the member configured to receive streaming unstructured data, the streaming unstructured data comprising a plurality of data objects, wherein the data objects include a plurality of words, and wherein the member has a plurality of parallel processing engines deployed thereon; the parallel processing engines configured to (1) analyze the data objects while the data objects stream through the member to perform a plurality of feature vector extraction operations on the streaming data objects that determine a plurality of features of the streaming data objects, wherein the determined features include a frequency of words within the data objects, and (2) create an association that is physically represented in memory between the determined features and the data objects. 14. The apparatus of claim 13 wherein the parallel processing engines include a parallel processing engine configured to generate a word count for a plurality of the words in the streaming data objects. 15. The apparatus of claim 13 wherein the parallel processing engines include a parallel processing engine configured to generate histograms with respect to a plurality of the words in the streaming data objects. 16. The apparatus of claim 13 further comprising: a processor configured to cluster the data objects based on the determined features to find clusters of the data objects that share similar features according to clustering criteria. 17. The apparatus of claim 13 wherein the member comprises the reconfigurable logic device. 18. The apparatus of claim 17 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA). 19. The apparatus of claim 18 wherein the FPGA comprises a plurality of FPGAs. 20. The apparatus of claim 19 wherein the parallel processing engines are partitioned across a plurality of the FPGAs. 21. The apparatus of claim 13 wherein the member comprises the GPU. 22. The apparatus of claim 13 wherein the member comprises the CMP. 23. The apparatus of claim 13 wherein the parallel processing engines include a parallel processing engine configured to create an index of the streaming data objects based on the determined features. 24. The apparatus of claim 23 further comprising: a database in which the index is stored; a data store in which the streaming unstructured data is stored; and a processor configured to (1) receive a query that is directed toward a combination of structured data and unstructured data and (2) access structured data in the database according to the index in response to the query to identify a subset of the unstructured data that is to be analyzed against the query; and wherein the member is configured to perform a query-specified data analysis operation on the identified subset of unstructured data to thereby generate data for a response to the query.

Assignees

Inventors

Classifications

  • Query execution (filtering based on additional data G06F16/335) · CPC title

  • Indexing structures · CPC title

  • Indexing structures · CPC title

  • Approximate or statistical queries · CPC title

  • G06F16/284Primary

    Relational databases · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11449538B2 cover?
Disclosed herein are methods and systems for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. In accordance with exemplary embodiments, the generation of feature vectors about unstructured data can be hardware-accelerated by processing streaming unstructured data through a reconfigurable…
Who is the assignee on this patent?
Ip Reservoir Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).