Method and system for learning transferable feature representations from a source domain for a target domain
US-2018218284-A1 · Aug 2, 2018 · US
US10235357B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10235357-B2 |
| Application number | US-201715478550-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 4, 2017 |
| Priority date | Apr 4, 2017 |
| Publication date | Mar 19, 2019 |
| Grant date | Mar 19, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for analyzing documents includes a processor receiving one or more documents, from a community-based document delivery system, related to a domain of interest; the processor identifying and extracting one or more data items from the one or more documents; determining if an identified and extracted data item comprises a true mention of a named entity; analyzing a context of the true mention of the named entity in the document; and determining, based on the analyzed context, if the document is a true document.
Opening claim text (preview).
We claim: 1. A community-based reporting and analysis system comprising a program of instructions stored on a non-transitory computer-readable storage medium, wherein when executed, the program of instructions cause a processor to: generate a domain of interest; based on the domain of interest, identify known terms in one or more resources, the known terms corresponding to named entities, wherein, for one or more known terms, the processor: tweaks one or more known terms to produce an expanded list of known terms, each tweaked known term in the expanded list of known terms corresponding to the named entity, and stores the known terms and the expanded list of known terms with a link from the expanded list of known terms to an original known term; use the known terms to train a neural network by iteratively applying the known terms to an input layer of the neural network and reading an output of an output layer of the neural network; receive a document related to the domain of interest; apply a natural language processing system to parse the document to identify one or more data items in the document; apply the parsed document to the trained neural network, thereby causing the neural network to: extract one or more of the identified data items from the parsed document, comprising the neural network: applying a data item to a series of layers of the neural network; and providing an output to the processor, determine that the data item comprises a true mention of a named entity based on the output, wherein the processor determines that the data item is a true mention when the neural network output indicates the data item matches exactly a known term in a list of named entities; analyze circumstances in which the true mention of the named entity appears in the parsed document; and determine, based on the analyzed characteristics, that the document is a true document. 2. The community-based reporting and analysis system of claim 1 , wherein the processor uses the tweaked known terms to train the neural network by iteratively applying each of the tweaked known terms to an input layer of the neural network and reading an output of an output layer of the neural network; and wherein the processor generates the domain of interest comprises the processor performing one or more of: receiving predefined events, executing the natural language processor to search the document to identify specific words, terms, and other data elements using named entity recognition, and executing a Web crawler to search for occurrences of the known terms using named entity recognition. 3. The community-based reporting and analysis system of claim 2 , wherein, for one or more tweaked known terms, wherein the processor, using the natural language processing system: applies to the document, one or more tweaked known terms of the expanded list of known terms; follows the link to the original known term to verify the tweaked known term from the expanded list of known terms corresponds to the original known term; and based on the correspondence, further verifies that the data item is a true mention of the known term. 4. The community-based reporting and analysis system of claim 1 , wherein the processor causes an alert to issue when a document is classified as a true document. 5. The community-based reporting and analysis system of claim 1 , wherein the document comprises unstructured and semi-structured data. 6. The community-based reporting and analysis system of claim 1 , wherein the processor determines if the mention is a true mention when the mention matches approximately a known term in a list of named entities. 7. The community-based reporting and analysis system of claim 1 , wherein the processor analyzing the context of the true mention of the named entity in the document comprises the processor comparing the context of the true mention of the named entity in the document with the domain of interest. 8. The community-based reporting and analysis system of claim 1 , wherein the document comprises image data item comprising image data of an image, and wherein the processor: obtains a particular pattern of pixels within the image; compares the pixel pattern to an object stored in memory of the system; and classifies the data item comprising image data as a true mention based on the comparison. 9. The community-based reporting and analysis system of claim 1 , wherein the document comprises one or more of social network site (SNS) messages, short message service messages, and instant messenger messages. 10. A computer-implemented method for analyzing documents, comprising: a processor receiving one or more documents, from a community-based document delivery system, related to a domain of interest; the processor identifying and extracting one or more data items from a document of the one or more documents, comprising the processor: accessing one or more lists of known terms, the lists relevant to the domain of interest, and applying a natural language processing system to identify a data item in the document based on an exact match of the data item to a known term in a list of known terms; the processor determining if an identified and extracted data item comprises a true mention of a known term, comprising: applying the identified and extracted data item to a trained neural network, causing the trained neural network to: apply the identified and extracted data item to a plurality of weighted layers; and produce an output indicating the data item is an exact match of a known term, wherein the known term is a named entity, and saving the known term with a relation to the document as a true mention comprising a known term, document pair; analyzing circumstances of the true mention of the known term in the document; and determining, based on the analyzed circumstances, that the document is a true document. 11. The method of claim 10 , wherein the processor receives a statement of the domain of interest, and based on the statement, identifies known terms in one or more resources, the known terms corresponding to named entities, and wherein the processor, in determining the true mention of identifies the known term as a named entity. 12. The method of claim 10 , wherein, for one or more known terms, the processor: tweaks the one or more known terms to produce an expanded list of known terms, each tweaked known term in the expanded list of known terms corresponding to an original known term; and stores the tweaked known terms as the expanded list of known terms with a link from the list of expanded known terms to the original known term, and wherein the processor: applies to the document, one or more tweaked known terms of the expanded list of known terms; follows the link to the original known term to verify the tweaked known term from the expanded list of known terms corresponds to the original known term; and based on the correspondence, further verifies that the data item is a true mention of the known term. 13. The method of claim 10 , wherein the processor causes an alert to issue when the document is classified as a true document. 14. The method of claim 10 , wherein the comprises unstructured and semi-structured data. 15. The method of claim 10 , wherein the document comprises image data item comprising image data of an image, and wherein the processor: obtains a particular pattern of pixels within the image; compares the pixel pattern to an object stored in memory of the system; and classifies the data item comprising image data as a true mention based on the comparison.
Named entity recognition · CPC title
Semantic analysis · CPC title
Document management systems · CPC title
Validation · CPC title
into predefined classes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.