System and method for analysis of structured and unstructured data

US11537662B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11537662-B2
Application numberUS-202017100019-A
CountryUS
Kind codeB2
Filing dateNov 20, 2020
Priority dateOct 13, 2017
Publication dateDec 27, 2022
Grant dateDec 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention relates to computer-implemented systems and methods for analyzing and standardizing various types of input data such as structured data, semi-structured data, unstructured data, and images and voice. Embodiments of the systems and the methods further provide for generating responses to specific questions based on the standardized input data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for analyzing at least one of structured and unstructured data, the method comprising: identifying at least one question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and applying an artificial intelligence process to the at least one input file, the artificial intelligence process comprising the steps of: generating, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; wherein the at least one element is associated with an element identifier and an element type, and is stored in a non-hierarchical relationship format; applying a specific ontology to the converted file to perform semantic annotation to the converted file; generating, based on the semantic annotation, at least one expression, the at least one expression comprising one or more of specific words, relationships between specific words, and word patterns that identify specific content in a converted file, wherein the expression comprises an expression string in a domain-specific language; reading, via a machine review portion of the artificial intelligence process, the at least one expression; and applying, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to automatically generate a response to the question; and applying the answer to the at least one question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 2. The method of claim 1 , wherein the data format represents extracted data from the at least one input file and corresponding metadata. 3. The method of claim 1 , wherein the at least one element is stored in an annotation format where the at least one element is stored separately from the at least one input file. 4. The method of claim 1 , wherein the at least one expression specifies one or more words, a relationship between the one or more words and at least one pattern that identifies document features. 5. The method of claim 1 , wherein the at least one expression represents one or more features to be utilized and one or more patterns of the features to be identified. 6. The method of claim 1 , wherein the at least one expression is an input to an intelligent domain engine (IDE) that leverages natural language processing to systematically classify and analyze a corpus of documents. 7. The method of claim 6 , wherein the intelligent domain engine further comprises a user interface to enable a user to modify the at least one expression. 8. The method of claim 1 , wherein the response to the question is communicated via a user interface. 9. The method of claim 8 , wherein the user interface displays support and justification associated with the response. 10. A system for analyzing at least one of structured and unstructured data, the system comprising: a scanner configured to receive at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and a server, wherein the server is configured to: identify at least one question and the scanned at least one input file; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; wherein the at least one element is associated with an element identifier and an element type and is stored in a non-hierarchical relationship format; apply a specific ontology to the converted file to resolve entities and perform semantic annotation, the entity resolution comprising one or more determinations of whether entities detected in the converted file refer to one or more real-world entities, and the semantic annotation comprising relating one or more phrases in the converted file to one or more concepts formally defined in the specific ontology; generate at least one expression, the at least one expression comprising one or more of specific words, relationships between specific words, and word patterns that identify specific content in a converted file, wherein the expression comprises an expression string in a domain-specific language; read, via a machine review portion of the artificial intelligence process, the at least one expression; and apply, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to automatically generate a response to the question; and apply the answer to the at least one question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 11. The system of claim 10 , wherein the data format represents extracted data from the at least one input file and corresponding metadata. 12. The system of claim 10 , wherein the at least one element is stored in an annotation format where the at least one element is stored separately from the at least one input file. 13. The system of claim 10 , wherein the at least one expression specifies one or more words, a relationship between the one or more words and at least one pattern that identifies document features. 14. The system of claim 10 , wherein the at least one expression represents one or more features to be utilized and one or more patterns of the features to be identified. 15. The system of claim 10 , wherein the at least one expression is an input to an intelligent domain engine (IDE) that leverages natural language processing to systematically classify and analyze a corpus of documents. 16. The system of claim 15 , wherein the intelligent domain engine further comprises a user interface to enable a user to modify the at least one expression. 17. The system of claim 10 , wherein the response to the question is communicated via a user interface. 18. The system of claim 17 , wherein the user interface displays support and justification associated with the response. 19. A system for analyzing at least one of structured and unstructured data, the system comprising: a server, wherein the server is configured to: identify at least one question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; wherein the at least one element is associated with an element identifier and an element type and is stored in a non-hierarchical relationship format; apply a specific ontology to the converted file to resolve entities and perform semantic annotation, the entity resolution comprising one or more determinations of whether entities detected in the converted file refer to one or more real-world entities, and the semantic annotation comprising relating one or more phrases in the converted file to one or more concepts formally defined in the specific ontology; generate, by an artificial intelligence operator, at least one expression, the at least one expression compris

Assignees

Inventors

Classifications

  • Classification of content, e.g. text, photographs or tables · CPC title

  • Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title

  • Details of conversion of file system types or formats · CPC title

  • Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

  • Annotation, e.g. comment data or footnotes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11537662B2 cover?
The invention relates to computer-implemented systems and methods for analyzing and standardizing various types of input data such as structured data, semi-structured data, unstructured data, and images and voice. Embodiments of the systems and the methods further provide for generating responses to specific questions based on the standardized input data.
Who is the assignee on this patent?
Kpmg Llp
What technology area does this patent fall under?
Primary CPC classification G06F16/90332. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).