Use of artificial intelligence techniques to identify possible inadvertent data disclosures in emails
US-2024422114-A1 · Dec 19, 2024 · US
US8938508B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-8938508-B1 |
| Application number | US-84155910-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jul 22, 2010 |
| Priority date | Jul 22, 2010 |
| Publication date | Jan 20, 2015 |
| Grant date | Jan 20, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer correlates web and email attributes to detect spam. A security module on a client collects attributes of a web site to which an email address was submitted and attributes of an email message sent to the email address that was previously submitted. The security module analyzes the attributes of the web site and the email message to determine whether the email message was sent to the email address responsive to the submission of the email address to the web site. Based on the analysis, the security module determines whether the email message is spam. A machine learning module on a security server establishes training data describing the attributes of the web site to which email addresses were submitted and attributes of legitimate emails received in response to the address submissions. The machine learning module generates an attributes classifier for the security module for spam detection.
Opening claim text (preview).
What is claimed is: 1. A method of detecting spam email messages comprising: using a computer to perform steps comprising: collecting attributes of a web site to which an email address was submitted; collecting attributes of an email message sent to the email address; identifying a degree of correlation between at least one of the collected attributes of the web site and at least one of the collected attributes of the email message, the identifying comprising using a classifier to analyze the at least one collected attribute of the web site and the at least one collected attribute of the email message, wherein the analysis is based at least in part on a plurality of weights describing different values that represent the relative importances of the collected attributes of the web site and email message, wherein the classifier is generated by training on training data describing attributes of training web sites to which email addresses were submitted and legitimate emails received responsive to the submissions of the email addresses to the training web sites, generating the classifier comprising: generating feature vectors from the training data, the feature vectors having features describing the attributes of the training web sites and having features describing the attributes of the legitimate emails received responsive to the submissions of the email addresses to the training web sites; and training the classifier using the feature vectors, the training causing the classifier to learn weights describing relative importances of the features in recognizing when email messages are received in response to email addresses submitted to web sites; and determining whether the email message is spam responsive at least in part to the degree of correlation, a stronger correlation indicating a decreased likelihood that the email message is spam. 2. The method of claim 1 , wherein collecting attributes of the web site to which an email address was submitted comprises: collecting one or more primary attributes describing the web site; and collecting one or more secondary attributes derived from the primary attributes. 3. The method of claim 2 , wherein the primary attributes describing the web site comprise at least one of an Internet Protocol (IP) address and a Domain Name System (DNS) name of a web server hosting the web site. 4. The method of claim 2 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of a web server hosting the web site, whether an IP address of the web server is known to be associated with an Internet Service Provider (ISP), information about a domain name registrar at which the DNS name for the web server is registered, and information about a registrant of the DNS name. 5. The method of claim 1 , wherein collecting attributes of an email message sent to the email address comprises: collecting one or more primary attributes describing the email message; and collecting one or more secondary attributes derived from the primary attributes. 6. The method of claim 5 , wherein the primary attributes describing the email message comprise at least one of a DNS name of a “from” address of the email message, an IP address of a mail server involved in sending the email message, a DNS name of the mail server involved in sending the email message, and attributes of a mail session involved in transmitting the email message. 7. The method of claim 5 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of the mail server involved in sending the email message, whether the IP address of the mail server is known to be associated with an ISP, information about a domain name registrar at which the DNS of the web server is registered, and information about a registrant of the DNS name. 8. A non-transitory computer-readable storage medium storing executable computer program instructions for detecting spam email messages, the computer program instructions comprising instructions for: collecting attributes of a web site to which an email address was submitted; collecting attributes of an email message sent to the email address; identifying a degree of correlation between at least one of the collected attributes of the web site and at least one of the collected attributes of the email message, the identifying comprising using a classifier to analyze the at least one collected attribute of the web site and the at least one collected attribute of the email message, wherein the analysis is based at least in part on a plurality of weights describing different values that represent the relative importances of the collected attributes of the web site and email message, wherein the classifier is generated by training on training data describing attributes of training web sites to which email addresses were submitted and legitimate emails received responsive to the submissions of the email addresses to the training web sites, generating the classifier comprising: generating feature vectors from the training data, the feature vectors having features describing the attributes of the training web sites and having features describing the attributes of the legitimate emails received responsive to the submissions of the email addresses to the training web sites; and training the classifier using the feature vectors, the training causing the classifier to learn weights describing relative importances of the features in recognizing when email messages are received in response to email addresses submitted to web sites; and determining whether the email message is spam responsive at least in part to the degree of correlation, a stronger correlation indicating a decreased likelihood that the email message is spam. 9. The computer-readable storage medium of claim 8 , wherein the computer program instructions for collecting attributes of the web site to which an email address was submitted comprise instructions for: collecting one or more primary attributes describing the web site; and collecting one or more secondary attributes derived from the primary attributes. 10. The computer-readable storage medium of claim 9 , wherein the primary attributes describing the web site comprise at least one of an IP address and a DNS name of a web server hosting the web site. 11. The computer-readable storage medium of claim 9 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of a web server hosting the web site, whether an IP address of the web server is known to be associated with an ISP, information about a domain name registrar at which the DNS name for the web server is registered, and information about a registrant of the DNS name. 12. The computer-readable storage medium of claim 8 , wherein the computer program instructions for collecting attributes of an email message sent to the email address comprise instructions for: collecting one or more primary attributes describing the email message; and collecting one or more secondary attributes derived from the primary attributes. 13. The computer-readable storage medium of claim 12 , wherein the primary attributes describing the email message comprise at least one of a DNS name of a “from” address of the email message, an IP address of a mail server involved in sending the email message, a DNS name of the mail server involved in sending the email message, and attributes of a mail session involved in transmitting the email message. 14. Th
using filtering or selective blocking · CPC title
Computer-aided management of electronic mailing [e-mailing] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.