Language model adaptation for specific texts
US-2015370784-A1 · Dec 24, 2015 · US
US9588958B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9588958-B2 |
| Application number | US-201213535638-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2012 |
| Priority date | Oct 10, 2006 |
| Publication date | Mar 7, 2017 |
| Grant date | Mar 7, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods are described for performing classification (categorization) of text documents written in various languages. Language-independent semantic structures are constructed before classifying documents. These structures reflect lexical, morphological, syntactic, and semantic properties of documents. The methods suggested are able to perform cross-language text classification which is based on document properties reflecting their meaning. The methods are applicable to genre classification, topic detection, news analysis, authorship analysis, etc.
Opening claim text (preview).
We claim: 1. A method of performing text classification based on language-independent text features, the method comprising: performing, by a processor, a first syntactic and semantic analysis of a training natural language text to produce a first plurality of language-independent semantic structures representing a plurality of sentences of the training natural language text; producing, based on the first plurality of language-independent semantic structures, a text classifier model; performing a second syntactic and semantic analysis of an input natural language text to produce a second plurality of language-independent semantic structures representing a plurality of sentences of the input natural language text; extracting, using the second plurality of language-independent semantic structures, a set of features, wherein at least one feature references a semantic class of a language-independent semantic hierarchy comprising a plurality of semantic classes, in which the semantic class exhibits one or more properties inherited from its parent semantic class; applying the text classifier model to the set of features to produce a classification spectrum comprising a plurality of weight values, wherein each weight value reflects a degree of association of the input natural language text with a particular category of natural language texts; and associating the input natural language text with one or more categories using the classification spectrum. 2. The method of claim 1 , wherein the second syntactic and semantic analysis further includes determining a grammatical feature of the input natural language text. 3. The method of claim 1 , wherein the second syntactic and semantic analysis further includes determining a lexical feature of the input natural language text. 4. The method of claim 1 , wherein the second syntactic and semantic analysis further includes determining a syntactic feature of the input natural language text. 5. The method of claim 1 , wherein the second syntactic and semantic analysis further includes determining a semantic feature of the input natural language text. 6. The method of claim 1 , wherein the second syntactic and semantic analysis further includes generating a syntactic structure of a sentence of the input natural language text. 7. The method of claim 1 , wherein the categories are represented by language independent categories. 8. A non-transitory computer readable storage medium comprising executable instructions for causing a computing system to perform operations comprising: performing a first syntactic and semantic analysis of a training natural language text to produce a first plurality of language-independent semantic structures representing a plurality of sentences of the training natural language text; producing, based on the first plurality of language-independent semantic structures, a text classifier model; performing a second syntactic and semantic analysis of an input natural language text to produce a second plurality of language-independent semantic structures representing a plurality of sentences of the input natural language text; extracting, using the second plurality of language-independent semantic structures, a set of features, wherein at least one feature references a semantic class of a language-independent semantic hierarchy comprising a plurality of semantic classes, in which the semantic class exhibits one or more properties inherited from its parent semantic class; applying the text classifier model to the set of features to produce a classification spectrum comprising a plurality of weight values, wherein each weight value references a degree of association of the input natural language text with a particular category of natural language texts; and associating the input natural language text with one or more categories using the classification spectrum. 9. The non-transitory computer readable storage medium of claim 8 , wherein the second syntactic and semantic analysis further includes determining a grammatical feature of the input natural language text. 10. The non-transitory computer readable medium of claim 8 , wherein the second syntactic and semantic analysis further includes determining a lexical feature of the input natural language text. 11. The non-transitory computer readable medium of claim 8 , wherein the second syntactic and semantic analysis further includes determining a syntactic feature of the input natural language text. 12. The non-transitory computer readable medium of claim 8 , wherein the second syntactic and semantic analysis further includes determining a semantic feature of the input natural language text. 13. The non-transitory computer readable medium of claim 8 , wherein the second syntactic and semantic analysis further includes generating a syntactic structure of a sentence of the input natural language text. 14. The non-transitory computer readable medium of claim 8 , wherein the categories are represented by language independent categories. 15. A computer system adapted to perform text classification based on language-independent text features, the computer system comprising: a feature extractor adapted to perform operations comprising: performing a first syntactic and semantic analysis of a training natural language text to produce a first plurality of language-independent semantic structures representing a plurality of sentences of the training natural language text; producing, based on the first plurality of language-independent semantic structures, a text classifier model; performing a second syntactic and semantic analysis of an input natural language text to produce a second plurality of language-independent semantic structures representing a plurality of sentences of the input natural language text; extracting, using the second plurality of language-independent semantic structures, a set of features, wherein at least one feature references a semantic class of a language-independent semantic hierarchy comprising a plurality of semantic classes, in which the semantic class exhibits one or more properties inherited from its parent semantic class; and a text classifier adapted to perform operations comprising: applying the text classifier model to the set of features to generate a classification spectrum comprising a plurality of weight values, wherein each weight value references a degree of association of the input natural language text with a particular category of natural language texts; and associating the input natural language text with one or more categories using the classification spectrum. 16. The computer system of claim 15 , wherein the feature extractor is further adapted to perform operations comprising: determining a grammatical feature of the input natural language text. 17. The computer system of claim 15 , wherein the feature extractor is further adapted to perform operations comprising: determining a lexical feature of the input natural language text. 18. The computer system of claim 15 , wherein the feature extractor is further adapted to perform operations comprising: determining a syntactic feature of the input natural language text. 19. The computer system of claim 15 , wherein the feature extractor is further adapted to perform operations comprising: determining a semantic feature of the input natural language text. 20. The computer system of claim 15 , wherein the feature extractor is further adapted to perform operations comprising: generating a syntactic structure of a se
Rule-based translation · CPC title
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Morphological analysis · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.