System and method for analysis of structured and unstructured data

US10846341B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10846341-B2
Application numberUS-201816159088-A
CountryUS
Kind codeB2
Filing dateOct 12, 2018
Priority dateOct 13, 2017
Publication dateNov 24, 2020
Grant dateNov 24, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention relates to computer-implemented systems and methods for analyzing and standardizing various types of input data such as structured data, semi-structured data, unstructured data, and images and voice. Embodiments of the systems and the methods further provide for generating responses to specific questions based on the standardized input data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for analyzing at least one of structured and unstructured data, the method comprising: receiving at least one specific question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and applying an artificial intelligence process to the at least one input file, the artificial intelligence process comprising the steps of: generating, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; generating the at least one element, wherein the at least one element includes an element identifier and an element type, and is stored in a non-hierarchical relationship format to other elements; generating at least one expression, wherein the expression comprises an expression string that is in a domain-specific language; reading, via a machine review portion of the artificial intelligence process, the at least one expression; applying, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to generate an output file having an answer to the specific question; and applying the answer to the specific question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 2. The method of claim 1 , wherein the converted file is configured to interface with a dynamic, interpreted language. 3. The method of claim 1 , wherein the converted file is configured to interface with Python. 4. The method of claim 1 , wherein the converted file is (i) implemented in Python and includes computer-object representations as Python objects and (ii) serialized as JSON for inter-process communication. 5. The method of claim 1 , wherein the converted file is configured for use with JSON, Swagger (YAML), and RESTful. 6. The method of claim 1 , wherein the converted file further includes a name of the document, a file path for the document, a file type of the document, and a binary representation of the document. 7. The method of claim 1 , wherein the at least one element further includes at least one attribute, wherein the at least one attribute comprises a key-value pair. 8. The method of claim 1 , wherein the expression is configured to interface with the format of the converted file. 9. The method of claim 1 , wherein the at least one element is generated and stored in a stand-off annotation format in an annotated file, wherein the at least one expression is applied to the annotated file to generate the output file. 10. The method of claim 9 , wherein the expression string (i) specifies at least one of a programmatic logical operation and a pattern to search for in the annotated file and (ii) incorporates subject matter expertise for a particular question. 11. The method of claim 1 , wherein the domain-specific language is stored in a comma-separated-value file. 12. A system for analyzing at least one of structured and unstructured data, the system comprising: a scanner, wherein the scanner is configured to receive at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and a server, wherein the server is configured to: receive at least one specific question and the scanned at least one input file; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; generate the at least one element, wherein the at least one element includes an element identifier and an element type and is stored in a non-hierarchical relationship format to other elements; generate at least one expression, wherein the expression comprises an expression string that is in a domain-specific language; read, via a machine review portion of the artificial intelligence process, the at least one expression; apply, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to generate an output file having an answer to the specific question; and apply the answer to the specific question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 13. The system of claim 12 , wherein the server includes an intelligent domain engine (IDE), wherein the IDE is configured to: receive the at least one expression, apply the at least one expression to the at least one input file, and output the answer to the specific question based on the applying. 14. The system of claim 13 , wherein the IDE incorporates at least one of natural language processing, machine learning, annotation components, and manually-encoded expressions to classify and analyze the at least one input file. 15. The system of claim 12 , wherein the server is further configured to: extract original source data and metadata from the at least one input file, store the extracted original source data in the converted file, generate the at least one element based on a conversion of the extracted metadata, store the generated at least one element in the converted file. 16. The system of claim 15 , wherein the metadata is at least one of author information, page information, paragraph information, and font information. 17. The system of claim 15 , wherein the extracted metadata is converted with a format-specific parser. 18. The system of claim 12 , wherein the server is further configured to perform at least one of entity resolution and semantic annotation on the at least one input file. 19. The system of claim 18 , wherein (i) the entity resolution determines a match between data associated with the at least one input file and data associated with at least one ontology and (ii) the semantic annotation connects the data associated with the at least one input file with the data associated with at least one ontology. 20. A system for analyzing at least one of structured and unstructured data, the system comprising: a server, wherein the server is configured to: receive at least one specific question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; generate the at least one element, wherein the at least one element includes an element identifier and an element type and is stored in a non-hierarchical relationship format to other elements; generate, by a artificial intelligence operator, at least one expression, wherein the expression comprises an expression string that is in a domain-specific language; read, via a machine review portion of the artificial intelligence process, the at least one expression; apply, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to generate an output file having an answer to the specific question; and apply the answer to

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Active learning · CPC title

  • Supervised learning · CPC title

  • Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

  • Classification of content, e.g. text, photographs or tables · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10846341B2 cover?
The invention relates to computer-implemented systems and methods for analyzing and standardizing various types of input data such as structured data, semi-structured data, unstructured data, and images and voice. Embodiments of the systems and the methods further provide for generating responses to specific questions based on the standardized input data.
Who is the assignee on this patent?
Kpmg Llp
What technology area does this patent fall under?
Primary CPC classification G06F16/90332. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).