Information processing method and apparatus
US-2016321541-A1 · Nov 3, 2016 · US
US9984068B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9984068-B2 |
| Application number | US-201514858413-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 18, 2015 |
| Priority date | Sep 18, 2015 |
| Publication date | May 29, 2018 |
| Grant date | May 29, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, apparatus, computer-readable media, and methods to provide filtering and/or search based at least in part on semantic representations of words in a document subject to the filtering and/or search are disclosed. Furthermore key words for conducting the filtering and/or search, such as taboo words and/or search terms, may be semantically compared to the semantic representation of the words in the document. A common semantic vector space, such as a base language semantic vector space, may be used to compare the key word semantic vectors and the semantic vectors of the words of the document, regardless of the native language in which the document is written or the language in which the key words are provided.
Opening claim text (preview).
The claimed invention is: 1. One or more non-transitory computer-readable medium comprising computer-executable instruction that, when executed by one or more processors, cause the one or more processors to at least: in response to receiving electronic content to be delivered to a destination address, identify a first word in the electronic content and a second word in the electronic content; determine a first base language semantic vector of the first word; determine a second base language semantic vector of the second word; determine, for a keyword, a key word base language semantic vector, the keyword being a taboo word; determine a first distance between the first base language semantic vector and the key word base language semantic vector; determine a second distance between the second base language semantic vector and the key word base language semantic vector; determine that the first distance is less than a threshold distance; determine that the second distance is less than the threshold distance; determine a sum of the first distance and the second distance; determine a score of the electronic content based at least in part on the sum, wherein the score indicates a relevance of the electronic content to the key word; determine that the electronic content is not to be delivered to the destination address based at least in part on the score of the electronic content; and prevent the electronic content from being delivered to the destination address. 2. The one or more non-transitory computer-readable medium of claim 1 , wherein the computer-executable instructions further cause the one or more processors to sequester the electronic content, when the electronic content is not to be delivered to the destination address. 3. The one or more non-transitory computer-readable medium of claim 1 , wherein the determining of the first base language semantic vector includes: determining a native language semantic vector corresponding to the first word; and transforming, based at least in part on a native language-to-base language translation matrix, the native language semantic vector to the first base language semantic vector. 4. The one or more non-transitory computer-readable medium of claim 1 , wherein the determining of the key word base language semantic vector includes: determining a key word native language semantic vector corresponding to the key word; and transforming, based at least in part on a native language-to-base language translation matrix, the key word native language semantic vector to the key word base language semantic vector. 5. The one or more non-transitory computer-readable medium of claim 1 , wherein the determining of the first distance includes determining at least one of: a cosine distance between the first base language semantic vector and the key word base language semantic vector, or an Euclidean distance between the first base language semantic vector and the key word base language semantic vector. 6. The one or more non-transitory computer-readable medium of claim 1 , wherein the computer-executable instructions further cause the one or more processors to: determine a first relevance between a first training document and the key word, the first training document having a first known filtering status; determine a second relevance between a second training document and the key word, the second training document having a second known filtering status; determine, with a filtering model, a filtering status for a plurality of training documents based at least in part on the first relevance and the second relevance; compare the filtering status to the first known filtering status; compare the filtering status to the second known filtering status; and train the filtering model based at least in part on a result of the comparing of the filtering status to the first known filtering status and the comparing of the filtering status to the second known filtering status. 7. A system, comprising: at least one memory that stores computer-executable instructions; and at least one processor to access the at least one memory, the computer-executable instructions, when executed, to cause the at least one processor to at least: in response to receiving electronic content to be delivered to a destination address, determine a first base language semantic vector corresponding to a first word in the electronic content; determine a second base language semantic vector corresponding to a second word in the electronic content; determine, for a key word, a key word base language semantic vector, the key word being a taboo word; determine a set of distance data including a first distance between the key word base language semantic vector and the first base language semantic vector, and a second distance between the key word base language semantic vector and the second base language semantic vector; determine whether the first distance and the second distance are less than a threshold distance; when the first and second distances are less than the threshold distance, add the first distance and the second distance to obtain a sum; determine a score of the electronic content based at least in part on the sum, wherein the score indicates a relevance of the electronic content to the key word; and determine that the electronic content is not to be delivered to the destination address based at least in part on the score of the electronic content; and prevent the electronic content from being delivered to the destination address. 8. The system of claim 7 , wherein the computer-executable instructions further cause the at least one processor to determine the first base language semantic vector by: determining a first native language semantic vector corresponding to the first word, wherein the first word is in a native language and the first native language semantic vector is defined in a native language semantic vector space corresponding to a native language of the first word; identifying a native language-to-base language translation matrix corresponding to the native language; and transforming, based at least in part on the native language-to-base language translation matrix, the first native language semantic vector to the first base language semantic vector. 9. The system of claim 7 , wherein the key word is associated with at least one of: pornography; sexually explicit content; violent content; adult content; gambling related content; gaming related content; or violent content. 10. The system of claim 7 , wherein the computer-executable instructions further cause the at least one processor to determine a key word base language semantic vector by identifying that the key word is received in a base language corresponding to the key word base language semantic vector. 11. The system of claim 7 , wherein the electronic content is first electronic content, the first electronic content includes a first document, the first document includes a first plurality of words, the set of distance data is a first set of distance data, and the computer-executable instructions further cause the at least one processor to: determine a third base language semantic vector corresponding to a third word included in a second document included in second electronic content; determine a fourth base language semantic vector corresponding to a fourth word included in the second document; determine a second set of distance data corresponding to the second document, wherein the second set of distance data includes a third distance between the third base language semantic vector and the key word base language semantic vector, and a fourth distance between the fourth base language s
Search customisation based on user profiles and personalisation · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Language identification · CPC title
using vector based model · CPC title
Translation of the query language, e.g. Chinese to English · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.