Identifying phishing communications using templates
US-2016337401-A1 · Nov 17, 2016 · US
US2017149824A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017149824-A1 |
| Application number | US-201715416632-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 26, 2017 |
| Priority date | May 13, 2015 |
| Publication date | May 25, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a legitimate entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with one or more legitimate entities associated with the one or more matched templates. In various implementations, the communication may be classified as a phishing attempt based on the determining.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method, comprising: performing, by one or more processors, a first comparison between content of a communication to content of a plurality of templates, wherein each template represents content shared among a cluster of communications sent by a known legitimate entity; identifying, by one or more of the processors based on the first comparison, one or more matching templates from the plurality of templates, wherein the one or more matching templates are associated with one or more known legitimate entities; performing, by one or more of the processors, a second comparison of an address associated with the communication with one or more respective address patterns associated with the one or more known legitimate entities; determining, by one or more of the processors based on the second comparison, that the address associated with the communication does not match the one or more respective address patterns; classifying, by one or more of the processors based on the determining, the communication as a phishing attempt; and discarding or re-routing, by one or more of the processors, the communication based on the classifying. 2 . The computer-implemented method of claim 1 , wherein the determining further comprises, for each of the one or more legitimate entities associated with the one or more matching templates, comparing: a combination of the address associated with the communication and a subject of the communication; to a combination of a pattern of addresses associated with the legitimate entity and a pattern found among subjects of communications sent by the legitimate entity. 3 . The computer-implemented method of claim 1 , wherein performing the first comparison comprises determining respective measures of similarity of the plurality of templates to the communication. 4 . The computer-implemented method of claim 3 , further comprising: ranking the plurality of templates based on their respective measures of similarity; and selecting, as the one or more matching templates, a predetermined number of highest ranking templates. 5 . The computer-implemented method of claim 1 , wherein the address is a linked-to network address contained in the communication. 6 . The computer-implemented method of claim 1 , wherein the address is a sender address. 7 . The computer-implemented method of claim 1 , wherein the address is a reply-to address. 8 . The computer-implemented method of claim 1 , wherein the first comparison comprises comparing one or more n-grams in the communication to one or more n-grams used to index the plurality of templates. 9 . The computer-implemented method of claim 8 , wherein the one or more n-grams used to index the plurality of templates are extracted from the content of the plurality of templates. 10 . The computer-implemented method of claim 1 , wherein the first comparison comprises comparing one or more overlapping n-grams in the communication to one or more overlapping n-grams used to index the plurality of templates. 11 . A computer-implemented method, comprising: matching, by one or more processors, a communication to a first subset of a plurality of templates using a forward index, wherein each template of the plurality of templates represents content shared among a cluster of communications sent by a known legitimate entity, and wherein the forward index is indexed on metadata associated with the plurality of templates; matching, by one or more of the processors, the communication to a second subset of the plurality of templates using a reverse index, wherein the reverse index is indexed on content of the plurality of templates; determining, by one or more of the processors, that there is no intersection between the first subset and the second subset; classifying, by one or more of the processors based on the determining, the communication as a phishing attempt; and discarding or re-routing, by one or more of the processors, the communication based on the classifying. 12 . The computer-implemented method of claim 11 , wherein the metadata associated with each of the plurality of templates comprises a sender address associated with a respective legitimate entity. 13 . The computer-implemented method of claim 11 , wherein the metadata associated with each of the plurality of templates comprises a reply-to address associated with a respective legitimate entity. 14 . The computer-implemented method of claim 11 , wherein the metadata associated with each of the plurality of templates comprises a combination of a sender address and a subject associated with a respective legitimate entity. 15 . The computer-implemented method of claim 11 , wherein the metadata associated with each of the plurality of templates comprises a subject associated with a respective legitimate entity. 16 . The computer-implemented method of claim 11 , wherein the metadata associated with each of the plurality of templates comprises a pattern of sender addresses associated with a respective legitimate entity. 17 . The computer-implemented method of claim 11 , wherein the indexed content of the plurality of templates includes one or more n-grams associated with a respective legitimate entity. 18 . A computer-implemented method, comprising: grouping, by one or more processors, a corpus of communications sent by a plurality of known legitimate entities into a plurality of clusters based at least in part on metadata associated with the corpus of communications, wherein each cluster includes communications sent by a known legitimate entity; generating, by one or more of the processors based on the plurality of clusters, a plurality of templates, wherein each template of the plurality of templates represents content shared among a cluster of the plurality of clusters that includes communications sent by a known legitimate entity; creating, by one or more of the processors, a forward index that is indexed on metadata associated with the plurality of templates; creating, by one or more of the processors, a reverse index, wherein the reverse index is indexed on content of the plurality of templates; and discarding or re-rerouting subsequent communications that match one or more templates in one of the forward or reverse indices, but not the other. 19 . The computer-implemented method of claim 18 , wherein the metadata associated with each of the plurality of templates comprises a combination of a sender address and a subject associated with a respective legitimate entity. 20 . The computer-implemented method of claim 18 , wherein the metadata associated with each of the plurality of templates comprises a subject associated with a respective legitimate entity.
Traffic logging, e.g. anomaly detection · CPC title
service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title
for managing network security; network security policies in general (filtering policies H04L63/0227) · CPC title
Stateful filtering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.