Domain-specific stopword removal from unstructured computer text using a neural network
US-10628471-B2 · Apr 21, 2020 · US
US11537662B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11537662-B2 |
| Application number | US-202017100019-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 20, 2020 |
| Priority date | Oct 13, 2017 |
| Publication date | Dec 27, 2022 |
| Grant date | Dec 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention relates to computer-implemented systems and methods for analyzing and standardizing various types of input data such as structured data, semi-structured data, unstructured data, and images and voice. Embodiments of the systems and the methods further provide for generating responses to specific questions based on the standardized input data.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for analyzing at least one of structured and unstructured data, the method comprising: identifying at least one question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and applying an artificial intelligence process to the at least one input file, the artificial intelligence process comprising the steps of: generating, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; wherein the at least one element is associated with an element identifier and an element type, and is stored in a non-hierarchical relationship format; applying a specific ontology to the converted file to perform semantic annotation to the converted file; generating, based on the semantic annotation, at least one expression, the at least one expression comprising one or more of specific words, relationships between specific words, and word patterns that identify specific content in a converted file, wherein the expression comprises an expression string in a domain-specific language; reading, via a machine review portion of the artificial intelligence process, the at least one expression; and applying, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to automatically generate a response to the question; and applying the answer to the at least one question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 2. The method of claim 1 , wherein the data format represents extracted data from the at least one input file and corresponding metadata. 3. The method of claim 1 , wherein the at least one element is stored in an annotation format where the at least one element is stored separately from the at least one input file. 4. The method of claim 1 , wherein the at least one expression specifies one or more words, a relationship between the one or more words and at least one pattern that identifies document features. 5. The method of claim 1 , wherein the at least one expression represents one or more features to be utilized and one or more patterns of the features to be identified. 6. The method of claim 1 , wherein the at least one expression is an input to an intelligent domain engine (IDE) that leverages natural language processing to systematically classify and analyze a corpus of documents. 7. The method of claim 6 , wherein the intelligent domain engine further comprises a user interface to enable a user to modify the at least one expression. 8. The method of claim 1 , wherein the response to the question is communicated via a user interface. 9. The method of claim 8 , wherein the user interface displays support and justification associated with the response. 10. A system for analyzing at least one of structured and unstructured data, the system comprising: a scanner configured to receive at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; and a server, wherein the server is configured to: identify at least one question and the scanned at least one input file; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; wherein the at least one element is associated with an element identifier and an element type and is stored in a non-hierarchical relationship format; apply a specific ontology to the converted file to resolve entities and perform semantic annotation, the entity resolution comprising one or more determinations of whether entities detected in the converted file refer to one or more real-world entities, and the semantic annotation comprising relating one or more phrases in the converted file to one or more concepts formally defined in the specific ontology; generate at least one expression, the at least one expression comprising one or more of specific words, relationships between specific words, and word patterns that identify specific content in a converted file, wherein the expression comprises an expression string in a domain-specific language; read, via a machine review portion of the artificial intelligence process, the at least one expression; and apply, via the machine review portion of the artificial intelligence process, the at least one expression to the converted file to automatically generate a response to the question; and apply the answer to the at least one question as feedback to the artificial intelligence process to improve the accuracy of the artificial intelligence process. 11. The system of claim 10 , wherein the data format represents extracted data from the at least one input file and corresponding metadata. 12. The system of claim 10 , wherein the at least one element is stored in an annotation format where the at least one element is stored separately from the at least one input file. 13. The system of claim 10 , wherein the at least one expression specifies one or more words, a relationship between the one or more words and at least one pattern that identifies document features. 14. The system of claim 10 , wherein the at least one expression represents one or more features to be utilized and one or more patterns of the features to be identified. 15. The system of claim 10 , wherein the at least one expression is an input to an intelligent domain engine (IDE) that leverages natural language processing to systematically classify and analyze a corpus of documents. 16. The system of claim 15 , wherein the intelligent domain engine further comprises a user interface to enable a user to modify the at least one expression. 17. The system of claim 10 , wherein the response to the question is communicated via a user interface. 18. The system of claim 17 , wherein the user interface displays support and justification associated with the response. 19. A system for analyzing at least one of structured and unstructured data, the system comprising: a server, wherein the server is configured to: identify at least one question and at least one input file to be analyzed, wherein the at least one input file comprises at least one of: text, an image, an audio file, a video file, a table, and a database; apply an artificial intelligence process to the at least one input file; generate, for the at least one input file, a converted file in a data format that is standardized for a plurality of input file types and that includes at least one element; wherein the at least one element is associated with an element identifier and an element type and is stored in a non-hierarchical relationship format; apply a specific ontology to the converted file to resolve entities and perform semantic annotation, the entity resolution comprising one or more determinations of whether entities detected in the converted file refer to one or more real-world entities, and the semantic annotation comprising relating one or more phrases in the converted file to one or more concepts formally defined in the specific ontology; generate, by an artificial intelligence operator, at least one expression, the at least one expression compris
Classification of content, e.g. text, photographs or tables · CPC title
Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title
Details of conversion of file system types or formats · CPC title
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
Annotation, e.g. comment data or footnotes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.