System and method for filtering keywords

US10114889B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10114889-B2
Application numberUS-201314411465-A
CountryUS
Kind codeB2
Filing dateMay 15, 2013
Priority dateJun 27, 2012
Publication dateOct 30, 2018
Grant dateOct 30, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for filtering information are described herein. In accordance with the present disclosure, a text acquisition module is configured to acquire text content to be filtered and a scanning module is configured to scan the text content to be filtered. The disclosed techniques scan the text content through a preset keyword dictionary, record a position of each keyword in the text content and acquire character pitch between keywords in the text content according to the position of each keyword in text content. A pitch judgment module is configured to judge whether the character pitch exceeds a preset character pitch and filter the keyword(s) in the text content in response to a determination that the character pitch exceeds the preset character pitch.

First claim

Opening claim text (preview).

The invention claimed is: 1. An improved information filtering system for filtering out sensitive information from content, which comprises: a processor; and a memory communicatively coupled to the processor and storing instructions that upon execution by the processor cause the system to: acquire text content; scan the text content through a preset keyword dictionary; in response to a determination that the text content contains a plurality of keywords stored in the preset keyword dictionary, determine a position of each of the plurality of keywords in the text content; determine at least one character pitch between any two of the plurality of keywords in the text content based on the position of each keyword among the plurality of keywords, wherein the at least one character pitch is a difference between positions of any two of the plurality of keywords in the text content; determine whether the at least one character pitch does not exceed a preset character pitch; in response to a determination that the at least one character pitch does not exceed the preset character pitch, filter out the plurality of keywords from the text content; wherein the preset keyword dictionary further stores a preset order of at least two keywords among all of the keywords that need to be filtered out; and wherein the memory further stores instructions that upon execution by the processor cause the system to: determine the order of the plurality of keywords according to the position of each keyword among the plurality of keywords in the text content, compare the order of the plurality of keywords in the text content with the preset order of corresponding keywords stored in the keyword dictionary, and when the order of the plurality of keywords in the text content matches the preset order of the corresponding keywords stored in the keyword dictionary, determine that the plurality of keywords satisfy the preset order. 2. The system according to claim 1 , wherein the plurality of keywords are words constituting sensitive information and the preset keyword dictionary stores all of keywords that need to be filtered out. 3. The system according to claim 1 , wherein the memory further stores instructions that upon execution by the processor cause the system to use a network spider to capture a web page to acquire the text content. 4. The system according to claim 1 , wherein the memory further stores instructions that upon execution by the processor cause the system to acquire the text content by means of receiving the text content. 5. A method for improving sensitive information filtering, which comprises steps of: acquiring text content; scanning the text content through a preset keyword dictionary; in response to a determination that the text content contains a plurality of keywords stored in the preset keyword dictionary, determining a position of each of the plurality of keywords in the text content; determining at least one character pitch between any two of the plurality of keywords in the text content based on the position of each keyword among the plurality of keywords, wherein the at least one character pitch is a difference between positions of any two of the plurality of keywords in the text content; determining whether the at least one character pitch does not exceed a preset character pitch; in response to a determination that the at least one character pitch does not exceed the preset character pitch, filtering out the plurality of keywords from the text content; wherein the preset keyword dictionary further stores a preset order of at least two keywords among all of the keywords that need to be filtered out; and wherein the method further comprises: determining the order of the plurality of keywords according to the position of each keyword among the plurality of keywords in the text content, comparing the order of the plurality of keywords in the text content with the preset order of corresponding keywords stored in the keyword dictionary, and when the order of the plurality of keywords in the text content matches the preset order of the corresponding keywords stored in the keyword dictionary, determining that the plurality of keywords satisfy the preset order. 6. The method according to claim 5 , wherein the plurality of keywords are words constituting sensitive information and the preset keyword dictionary stores all of keywords that need to be filtered out. 7. The method according to claim 5 , wherein using a network spider to capture a web page to acquire the text content. 8. The method according to claim 5 , wherein acquiring the text content by means of receiving the text content. 9. A non-transitory computer readable medium having instructions stored thereon that, when executed by at least one processor, cause the at least one processor to perform operations for filtering keywords, the operations comprising: acquiring text content; scanning the text content through a preset keyword dictionary; in response to a determination that the text content contains a plurality of keywords stored in the preset keyword dictionary, determining a position of each of the plurality of keywords in the text content; determining at least one character pitch between any two of the plurality of keywords in the text content based on the position of each keyword among the plurality of keywords, wherein the at least one character pitch is a difference between positions of any two of the plurality of keywords in the text content; determining whether the at least one character pitch does not exceed a preset character pitch; in response to a determination that the at least one character pitch does not exceed the preset character pitch, filtering out the plurality of keywords from the text content; wherein the preset keyword dictionary further stores a preset order of at least two keywords among all of the keywords that need to be filtered out; and wherein the operations further comprises: determining the order of the plurality of keywords according to the position of each keyword among the plurality of keywords in the text content, comparing the order of the plurality of keywords in the text content with the preset order of corresponding keywords stored in the keyword dictionary, and when the order of the plurality of keywords in the text content matches the preset order of the corresponding keywords stored in the keyword dictionary, determining that the plurality of keywords satisfy the preset order.

Assignees

Inventors

Classifications

  • G06F16/335Primary

    Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10114889B2 cover?
Techniques for filtering information are described herein. In accordance with the present disclosure, a text acquisition module is configured to acquire text content to be filtered and a scanning module is configured to scan the text content to be filtered. The disclosed techniques scan the text content through a preset keyword dictionary, record a position of each keyword in the text content a…
Who is the assignee on this patent?
Beijing Qihoo Technology Co
What technology area does this patent fall under?
Primary CPC classification G06F16/335. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 30 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).