Community-based reporting and analysis system and method

US10235357B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10235357-B2
Application numberUS-201715478550-A
CountryUS
Kind codeB2
Filing dateApr 4, 2017
Priority dateApr 4, 2017
Publication dateMar 19, 2019
Grant dateMar 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for analyzing documents includes a processor receiving one or more documents, from a community-based document delivery system, related to a domain of interest; the processor identifying and extracting one or more data items from the one or more documents; determining if an identified and extracted data item comprises a true mention of a named entity; analyzing a context of the true mention of the named entity in the document; and determining, based on the analyzed context, if the document is a true document.

First claim

Opening claim text (preview).

We claim: 1. A community-based reporting and analysis system comprising a program of instructions stored on a non-transitory computer-readable storage medium, wherein when executed, the program of instructions cause a processor to: generate a domain of interest; based on the domain of interest, identify known terms in one or more resources, the known terms corresponding to named entities, wherein, for one or more known terms, the processor: tweaks one or more known terms to produce an expanded list of known terms, each tweaked known term in the expanded list of known terms corresponding to the named entity, and stores the known terms and the expanded list of known terms with a link from the expanded list of known terms to an original known term; use the known terms to train a neural network by iteratively applying the known terms to an input layer of the neural network and reading an output of an output layer of the neural network; receive a document related to the domain of interest; apply a natural language processing system to parse the document to identify one or more data items in the document; apply the parsed document to the trained neural network, thereby causing the neural network to: extract one or more of the identified data items from the parsed document, comprising the neural network: applying a data item to a series of layers of the neural network; and providing an output to the processor, determine that the data item comprises a true mention of a named entity based on the output, wherein the processor determines that the data item is a true mention when the neural network output indicates the data item matches exactly a known term in a list of named entities; analyze circumstances in which the true mention of the named entity appears in the parsed document; and determine, based on the analyzed characteristics, that the document is a true document. 2. The community-based reporting and analysis system of claim 1 , wherein the processor uses the tweaked known terms to train the neural network by iteratively applying each of the tweaked known terms to an input layer of the neural network and reading an output of an output layer of the neural network; and wherein the processor generates the domain of interest comprises the processor performing one or more of: receiving predefined events, executing the natural language processor to search the document to identify specific words, terms, and other data elements using named entity recognition, and executing a Web crawler to search for occurrences of the known terms using named entity recognition. 3. The community-based reporting and analysis system of claim 2 , wherein, for one or more tweaked known terms, wherein the processor, using the natural language processing system: applies to the document, one or more tweaked known terms of the expanded list of known terms; follows the link to the original known term to verify the tweaked known term from the expanded list of known terms corresponds to the original known term; and based on the correspondence, further verifies that the data item is a true mention of the known term. 4. The community-based reporting and analysis system of claim 1 , wherein the processor causes an alert to issue when a document is classified as a true document. 5. The community-based reporting and analysis system of claim 1 , wherein the document comprises unstructured and semi-structured data. 6. The community-based reporting and analysis system of claim 1 , wherein the processor determines if the mention is a true mention when the mention matches approximately a known term in a list of named entities. 7. The community-based reporting and analysis system of claim 1 , wherein the processor analyzing the context of the true mention of the named entity in the document comprises the processor comparing the context of the true mention of the named entity in the document with the domain of interest. 8. The community-based reporting and analysis system of claim 1 , wherein the document comprises image data item comprising image data of an image, and wherein the processor: obtains a particular pattern of pixels within the image; compares the pixel pattern to an object stored in memory of the system; and classifies the data item comprising image data as a true mention based on the comparison. 9. The community-based reporting and analysis system of claim 1 , wherein the document comprises one or more of social network site (SNS) messages, short message service messages, and instant messenger messages. 10. A computer-implemented method for analyzing documents, comprising: a processor receiving one or more documents, from a community-based document delivery system, related to a domain of interest; the processor identifying and extracting one or more data items from a document of the one or more documents, comprising the processor: accessing one or more lists of known terms, the lists relevant to the domain of interest, and applying a natural language processing system to identify a data item in the document based on an exact match of the data item to a known term in a list of known terms; the processor determining if an identified and extracted data item comprises a true mention of a known term, comprising: applying the identified and extracted data item to a trained neural network, causing the trained neural network to: apply the identified and extracted data item to a plurality of weighted layers; and produce an output indicating the data item is an exact match of a known term, wherein the known term is a named entity, and saving the known term with a relation to the document as a true mention comprising a known term, document pair; analyzing circumstances of the true mention of the known term in the document; and determining, based on the analyzed circumstances, that the document is a true document. 11. The method of claim 10 , wherein the processor receives a statement of the domain of interest, and based on the statement, identifies known terms in one or more resources, the known terms corresponding to named entities, and wherein the processor, in determining the true mention of identifies the known term as a named entity. 12. The method of claim 10 , wherein, for one or more known terms, the processor: tweaks the one or more known terms to produce an expanded list of known terms, each tweaked known term in the expanded list of known terms corresponding to an original known term; and stores the tweaked known terms as the expanded list of known terms with a link from the list of expanded known terms to the original known term, and wherein the processor: applies to the document, one or more tweaked known terms of the expanded list of known terms; follows the link to the original known term to verify the tweaked known term from the expanded list of known terms corresponds to the original known term; and based on the correspondence, further verifies that the data item is a true mention of the known term. 13. The method of claim 10 , wherein the processor causes an alert to issue when the document is classified as a true document. 14. The method of claim 10 , wherein the comprises unstructured and semi-structured data. 15. The method of claim 10 , wherein the document comprises image data item comprising image data of an image, and wherein the processor: obtains a particular pattern of pixels within the image; compares the pixel pattern to an object stored in memory of the system; and classifies the data item comprising image data as a true mention based on the comparison.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10235357B2 cover?
A computer-implemented method for analyzing documents includes a processor receiving one or more documents, from a community-based document delivery system, related to a domain of interest; the processor identifying and extracting one or more data items from the one or more documents; determining if an identified and extracted data item comprises a true mention of a named entity; analyzing a co…
Who is the assignee on this patent?
Architecture Tech Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).