Hallucination Detection
US-2024394600-A1 · Nov 28, 2024 · US
US9305079B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9305079-B2 |
| Application number | US-201313957313-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2013 |
| Priority date | Jun 23, 2003 |
| Publication date | Apr 5, 2016 |
| Grant date | Apr 5, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject invention provides for an advanced and robust system and method that facilitates detecting spam. The system and method include components as well as other operations which enhance or promote finding characteristics that are difficult for the spammer to avoid and finding characteristics in non-spam that are difficult for spammers to duplicate. Exemplary characteristics include analyzing character and/or number sequences, strings, and sub-strings, detecting various entropy levels of one or more character sequences, strings and/or sub-strings and analyzing message headers.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for filtering messages, comprising: receiving a first electronic mail (email) message; analyzing a portion of the first email message by searching for character sequences that are indicative of spam, wherein the character sequences correspond to one or more runs of characters of a particular run length including individual lengths of characters and sub-lengths of characters that are not restricted to whole words or space-separated words; determining a degree of randomness associated with an individual character sequence of the character sequences; generating a feature relating to the individual character sequence based at least partly on the degree of randomness associated with the individual character sequence; training a machine learning filter using at least the feature to generate a trained machine learning filter; employing the trained machine learning filter to obtain a verdict as to whether one or more features of a second email message indicate that the second email message is likely to be spam, and filtering the second email message based at least in part on the verdict. 2. A method as recited in claim 1 , wherein the character sequences comprise character n-grams that are indicative of spam-like messages. 3. A method as recited in claim 2 , wherein the character n-grams are located in at least one of a from address, a subject line, a text body, an html body, or an attachment. 4. A method as recited in claim 1 , wherein the first email message comprises at least one of foreign language text, Unicode character types, or other character types not common to English. 5. A method as recited in claim 4 , wherein the foreign language text comprises substantially non-space separated words. 6. A method as recited in claim 1 , wherein the character sequences that are indicative of spam comprise strings of random characters. 7. A method as recited in claim 1 , wherein the analyzing the portion of the first email message comprises processing at least a portion of the first email message in which the individual character sequence occurs. 8. A method as recited in claim 7 , wherein the processing at least the portion of the first email message comprises determining an average degree of randomness associated with the portion of the first email message, and wherein the feature relates to a comparison between the degree of randomness associated with the individual character sequence and the average degree of randomness associated with the portion of the first email message. 9. A method as recited in claim 7 , further comprising calculating an entropy of a particular run of characters of the one or more runs of characters and employing the entropy as an additional feature in connection with training the machine learning filter. 10. A method as recited in claim 9 , wherein the entropy is an average entropy calculated as an entropy per character of the particular run of characters. 11. A method as recited in claim 9 , wherein the entropy is a relative entropy determined by a comparison of a first entropy of a first particular run of characters at a first location within the first email message relative to a second entropy of a second particular run of characters at a second location within the first email message. 12. A method as recited in claim 11 , wherein the first location and the second location comprise one of a beginning of a message body of the first email message, a middle of the message body of the first email message, or an end of the message body of the first email message, wherein the first location is different from the second location. 13. A method as recited in claim 1 , wherein employing the trained machine learning filter to obtain the verdict as to whether the one or more features of the second email message indicate that the second email message is likely to be spam comprises: receiving the second email message; generating the one or more features of the second email message based on at least one of one or more runs of characters in the second email message and entropy determinations of the one or more runs of characters in the second email message; passing the one or more features of the second email message through the trained machine learning filter; and obtaining the verdict as to whether the one or more features of the second email message indicate that the second email message is likely to be spam. 14. A computer-implemented method for filtering messages, comprising: receiving a first electronic mail (email) message; analyzing one or more features of a message header associated with the first email message; analyzing a portion of the first email message by searching for character sequences that are indicative of spam, the character sequences corresponding to one or more runs of characters of a particular run length; determining a degree of randomness for an individual run of characters of the one or more runs of characters; determining an average degree of randomness for the portion of the first email message within which the individual run of characters occurs; generating a feature relating to the individual run of characters based at least in part on a comparison between the degree of randomness and the average degree of randomness; and training a machine learning spam filter using the feature to generate a trained machine learning spam filter; employing the trained machine learning spam filter to obtain a verdict as to whether one or more features of a second email message indicate that the second email message is likely to be spam, and filtering the second email message based at least in part on the verdict. 15. A method as recited in claim 14 , wherein the one or more features of the message header comprise at least one of a presence or absence of at least one message header type, the at least one message header type comprising at least one of X-Priority, mail software, or a header line for unsubscribing. 16. A method as recited in claim 15 , wherein the one or more features of the message header further comprise content associated with the at least one message header type. 17. A method as recited in claim 14 , further comprising: analyzing at least a portion of the first email message for images and related image information; generating image features relating to one of the images and the related image information; and further training the machine learning spam filter using the image features. 18. A method as recited in claim 17 , wherein the related image information comprises one or more of image size, image quantity, location of image, image dimensions, or image type. 19. A method as recited in claim 17 , wherein the related image information comprises a first URL and a second URL such that the image is represented within a hyperlink. 20. A computer storage device having computer executable instructions stored thereon, which, when executed by one or more processors, cause the one or more processors to: analyze a first portion of a first electronic mail (email) message by searching for particular character sequences that are indicative of spam, wherein the particular character sequences correspond to one or more runs of characters of a particular run length; analyze a second portion of the first email message by searching for instances of strings of random characters that are indicative of spam; analyze a message header associated with the first email message; determining a degree of randomness associated with at
Computer-aided management of electronic mailing [e-mailing] · CPC title
of unstructured textual data (document management systems G06F16/93) · CPC title
Business processes related to postal services (shipping G06Q10/083; franking apparatus G07B17/00) · CPC title
using filtering or selective blocking · CPC title
Business processes related to the communications industry (charging, metering or billing arrangements specially adapted for data communications H04L12/14; telephonic communication involving automatic or semi-automatic exchanges H04M3/00; arrangements for metering, time-control or time indication H04M15/00; prepayment telephone systems H04M17/00; accounting or billing for wireless communication networks H04W4/24) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.