Method and device for identifying url legitimacy

US2017126723A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017126723-A1
Application numberUS-201615275303-A
CountryUS
Kind codeA1
Filing dateSep 23, 2016
Priority dateOct 30, 2015
Publication dateMay 4, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention provides a method and device for identifying URL legitimacy. Through obtaining a URL to be identified, and then obtaining, based on the URL to be identified, a legitimate URL corresponding to the URL to be identified as a comparison object, and calculating a degree of similarity between the URL to be identified and the comparison object, the present invention makes it possible to identify the legitimacy of the URL to be identified based on the degree of similarity, enabling timely discovering of illegitimate URLs and thus improving the safety of information processing.

First claim

Opening claim text (preview).

We claim: 1 . A method for identifying URL legitimacy, wherein the method comprises: obtaining a URL to be identified; obtaining, based on the URL to be identified, a legitimate URL corresponding to the URL to be identified as a comparison object; calculating a degree of similarity between the URL to be identified and the comparison object; identifying the legitimacy of the URL to be identified based on the degree of similarity. 2 . The method according to claim 1 , wherein the step of obtaining, based on the URL to be identified, a legitimate URL corresponding to the URL to be identified as a comparison object comprises: obtaining, based on the URL to be identified and an inverted index of legitimate URLs, a legitimate URL corresponding to the URL to be identified as the comparison object. 3 . The method according to claim 2 , wherein, the method comprises, before the step of obtaining, based on the URL to be identified and an inverted index of legitimate URLs, a legitimate URL corresponding to the URL to be identified as the comparison object, the following: collecting at least one legitimate URL; carrying out word segmentation on each of the legitimate URLs of the at least one legitimate URL with a N-Gram model, so as to obtain a segmentation result; obtaining the inverted index of legitimate URLs based on each of the legitimate URLs and the segmentation result of each of the legitimate URLs. 4 . The method according to claim 3 , wherein, the step of carrying out a word segmentation on each of the legitimate URLs of the at least one legitimate URL with a N-Gram model, so as to obtain a segmentation result comprises: obtaining the domain name of each of the legitimate URLs based each of the legitimate URLs; removing the prefix and suffix of the domain name of each of the legitimate URLs, so as to obtain an essential word of each of the legitimate URLs; carrying out word segmentation on the essential word of each of the legitimate URLs with a N-Gram model, so as to obtain a segmentation result. 5 . The method according to claim 1 , wherein, the step of identifying the legitimacy of the URL to be identified based on the degree of similarity comprises: identifying the URL to be identified as a legitimate URL if the degree of similarity is equal to 1 and the suffix of the URL to be identified is consistent with the suffix of the comparison object; or identifying the URL to be identified as a suspected illegitimate URL if the degree of similarity is equal to 1 and the suffix of the URL to be identified is inconsistent with the suffix of the comparison object; or identifying the URL to be identified as an illegitimate URL if the degree of similarity is greater than or equal to a first threshold value and less than 1; identifying the URL to be identified as a suspected illegitimate URL if the degree of similarity is greater than or equal to a second threshold value and less than the first threshold value, wherein the second threshold value is less than the first threshold value; identifying the URL to be identified as a legitimate URL if the degree of similarity is less than the second threshold value or equal to 1. 6 . The method according to claim 5 , wherein, before the step of identifying the legitimacy of the URL to be identified based on the degree of similarity, the method further comprises: carrying out legitimacy identification processing on at least one sample URL with the at least one legitimate URL, so as to obtain an identification result; obtaining the first threshold value and the second threshold value based on the identification result and a labeling result of each of the sample URLs of the at least one sample URL. 7 . The method according to claim 1 , wherein after the step of identifying the legitimacy of the URL to be identified based on the degree of similarity, the method further comprises: sending the identification result to a terminal so that: the terminal displays the identification result; and/or the terminal allows or prohibits, based on the identification result, executing access operations based on the URL to be identified. 8 . A nonvolatile computer storage medium, stored with one or more programs, which, when executed by an apparatus, make the apparatus to execute the following operation: obtaining a URL to be identified; obtaining, based on the URL to be identified, a legitimate URL corresponding to the URL to be identified as a comparison object; calculating a degree of similarity between the URL to be identified and the comparison object; identifying the legitimacy of the URL to be identified based on the degree of similarity. 9 . The nonvolatile computer storage medium according to claim 8 , wherein the operation of obtaining, based on the URL to be identified, a legitimate URL corresponding to the URL to be identified as a comparison object comprises: obtaining, based on the URL to be identified and an inverted index of legitimate URLs, a legitimate URL corresponding to the URL to be identified as the comparison object. 10 . The nonvolatile computer storage medium according to claim 9 , wherein, before the operation of obtaining, based on the URL to be identified and an inverted index of legitimate URLs, a legitimate URL corresponding to the URL to be identified as the comparison object, the one or more programs make the apparatus to further execute the following operation: collecting at least one legitimate URL; carry out word segmentation on each of the legitimate URLs of the at least one legitimate URL with a N-Gram model, so as to obtain a segmentation result; obtaining the inverted index of legitimate URLs based on each of the legitimate URLs and the segmentation result of each of the legitimate URLs. 11 . The nonvolatile computer storage medium according to claim 10 , wherein the operation of carrying out a word segmentation on each of the legitimate URLs of the at least one legitimate URL with a N-Gram model, so as to obtain a segmentation result comprises: obtaining the domain name of each of the legitimate URLs based each of the legitimate URLs; removing the prefix and suffix of the domain name of each of the legitimate URLs, so as to obtain an essential word of each of the legitimate URLs; carrying out word segmentation on the essential word of each of the legitimate URLs with a N-Gram model, so as to obtain a segmentation result. 12 . The nonvolatile computer storage medium according to claim 8 , wherein, the operation of identifying the legitimacy of the URL to be identified based on the degree of similarity comprises: identifying the URL to be identified as a legitimate URL if the degree of similarity is equal to 1 and the suffix of the URL to be identified is consistent with the suffix of the comparison object; or identifying the URL to be identified as a suspected illegitimate URL if the degree of similarity is equal to 1 and the suffix of the URL to be identified is inconsistent with the suffix of the comparison object; or identifying the URL to be identified as an illegitimate URL if the degree of similarity is greater than or equal to a first threshold value and less than 1; identifying the URL to be identified as a suspected illegitimate URL if the degree of similarity is greater than or equal to a second threshold value and less than the first threshold value, wherein the second threshold value is less than the first threshold value; identifying the URL to be identified as a legitimate URL if the degree of similarity is less than the second threshold value or equal to 1. 13 . The nonvolatile computer storage medium according to

Assignees

Inventors

Classifications

  • Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title

  • Parsing · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • above the transport layer · CPC title

  • Entity profiles · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017126723A1 cover?
The present invention provides a method and device for identifying URL legitimacy. Through obtaining a URL to be identified, and then obtaining, based on the URL to be identified, a legitimate URL corresponding to the URL to be identified as a comparison object, and calculating a degree of similarity between the URL to be identified and the comparison object, the present invention makes it poss…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu May 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).