Domain-specific stopword removal from unstructured computer text using a neural network
US-10628471-B2 · Apr 21, 2020 · US
US10846341B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10846341-B2 |
| Application number | US-201816159088-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 12, 2018 |
| Priority date | Oct 13, 2017 |
| Publication date | Nov 24, 2020 |
| Grant date | Nov 24, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention relates to computer-implemented systems and methods for analyzing and standardizing various types of input data such as structured data, semi-structured data, unstructured data, and images and voice. Embodiments of the systems and the methods further provide for generating responses to specific questions based on the standardized input data.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for analyzing at least one of structured and unstructured data, the method comprising: receiving at least one specific question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and applying an artificial intelligence process to the at least one input file, the artificial intelligence process comprising the steps of: generating, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; generating the at least one element, wherein the at least one element includes an element identifier and an element type, and is stored in a non-hierarchical relationship format to other elements; generating at least one expression, wherein the expression comprises an expression string that is in a domain-specific language; reading, via a machine review portion of the artificial intelligence process, the at least one expression; applying, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to generate an output file having an answer to the specific question; and applying the answer to the specific question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 2. The method of claim 1 , wherein the converted file is configured to interface with a dynamic, interpreted language. 3. The method of claim 1 , wherein the converted file is configured to interface with Python. 4. The method of claim 1 , wherein the converted file is (i) implemented in Python and includes computer-object representations as Python objects and (ii) serialized as JSON for inter-process communication. 5. The method of claim 1 , wherein the converted file is configured for use with JSON, Swagger (YAML), and RESTful. 6. The method of claim 1 , wherein the converted file further includes a name of the document, a file path for the document, a file type of the document, and a binary representation of the document. 7. The method of claim 1 , wherein the at least one element further includes at least one attribute, wherein the at least one attribute comprises a key-value pair. 8. The method of claim 1 , wherein the expression is configured to interface with the format of the converted file. 9. The method of claim 1 , wherein the at least one element is generated and stored in a stand-off annotation format in an annotated file, wherein the at least one expression is applied to the annotated file to generate the output file. 10. The method of claim 9 , wherein the expression string (i) specifies at least one of a programmatic logical operation and a pattern to search for in the annotated file and (ii) incorporates subject matter expertise for a particular question. 11. The method of claim 1 , wherein the domain-specific language is stored in a comma-separated-value file. 12. A system for analyzing at least one of structured and unstructured data, the system comprising: a scanner, wherein the scanner is configured to receive at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and a server, wherein the server is configured to: receive at least one specific question and the scanned at least one input file; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; generate the at least one element, wherein the at least one element includes an element identifier and an element type and is stored in a non-hierarchical relationship format to other elements; generate at least one expression, wherein the expression comprises an expression string that is in a domain-specific language; read, via a machine review portion of the artificial intelligence process, the at least one expression; apply, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to generate an output file having an answer to the specific question; and apply the answer to the specific question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 13. The system of claim 12 , wherein the server includes an intelligent domain engine (IDE), wherein the IDE is configured to: receive the at least one expression, apply the at least one expression to the at least one input file, and output the answer to the specific question based on the applying. 14. The system of claim 13 , wherein the IDE incorporates at least one of natural language processing, machine learning, annotation components, and manually-encoded expressions to classify and analyze the at least one input file. 15. The system of claim 12 , wherein the server is further configured to: extract original source data and metadata from the at least one input file, store the extracted original source data in the converted file, generate the at least one element based on a conversion of the extracted metadata, store the generated at least one element in the converted file. 16. The system of claim 15 , wherein the metadata is at least one of author information, page information, paragraph information, and font information. 17. The system of claim 15 , wherein the extracted metadata is converted with a format-specific parser. 18. The system of claim 12 , wherein the server is further configured to perform at least one of entity resolution and semantic annotation on the at least one input file. 19. The system of claim 18 , wherein (i) the entity resolution determines a match between data associated with the at least one input file and data associated with at least one ontology and (ii) the semantic annotation connects the data associated with the at least one input file with the data associated with at least one ontology. 20. A system for analyzing at least one of structured and unstructured data, the system comprising: a server, wherein the server is configured to: receive at least one specific question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; generate the at least one element, wherein the at least one element includes an element identifier and an element type and is stored in a non-hierarchical relationship format to other elements; generate, by a artificial intelligence operator, at least one expression, wherein the expression comprises an expression string that is in a domain-specific language; read, via a machine review portion of the artificial intelligence process, the at least one expression; apply, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to generate an output file having an answer to the specific question; and apply the answer to
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Active learning · CPC title
Supervised learning · CPC title
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
Classification of content, e.g. text, photographs or tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.