Method and system for searching and storing data

US9715525B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9715525-B2
Application numberUS-201313929870-A
CountryUS
Kind codeB2
Filing dateJun 28, 2013
Priority dateJun 28, 2013
Publication dateJul 25, 2017
Grant dateJul 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This invention relates to methods for storing and searching data. Embodiments of the invention make use of suffix trees to support binary pattern matching. Embodiments of the invention can be shown to have comparable search speeds to searches of known suffix trees, but are advantageous in that they have lower memory usage requirements which is important in large data environments.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of determining whether a search string is contained in stored data which stores a plurality of data strings each in a structured form, the structured form including the storage of subsets of each data string of predetermined length stored in as the edges of a tree structure along with prefix data including all possible prefixes for each such edge which have length less than said predetermined length, the method including the steps of: dividing the search string into a plurality of sub-searches, the number of sub-searches being equal to the predetermined length of the subsets and each sub-search containing a prefix, a main and a postfix, by the steps of: setting the prefix of the first sub-search to be null, setting the prefix of the second sub-search to be the first item in the search string, and setting the prefix of the ith sub-search to be the first (i−1) items in the search string for all remaining sub-searches; setting the main of each sub-search to be those items in the search string immediately following the prefix which make up complete subsets of the search string, the subsets being of said predetermined length; setting the postfix of each sub-search to be those items at the end of the search string which are not included in the prefix or main or, if there are no such items, to be null, comparing each of the sub-searches to the stored data to determine if a match exists by, for each sub-search, until all sub-searches have been considered: determining if the main of the sub-search is contained in the subsets of any of the stored data strings, if it is not then moving to the next sub-search; if it is, determining if the prefix of the sub-search is contained in the prefix data stored for that subset, if it is not then moving to the next sub-search; if it is, determining if the postfix of the sub-search is contained in the stored data strings by: a) if the last data entry in the main of the sub-search matched a subset of a stored data string that has subsequent subsets in the tree structure, determining if the postfix of the sub-search matches the initial part of any of those subsequent subsets; b) if the last data entry in the main of the sub-search matched a subset of a stored data string that has no subsequent subsets in the tree structure, determining if the postfix of the sub-search matches the initial part of the postfix of that stored data string; or c) if the last data entry in the main of the sub-search matched part of a subset of a stored data string, determining if the postfix of the sub-search matches the initial part of the remaining part of that subset. 2. A method of finding a specified sequence in one or more stored DNA sequences, wherein the stored DNA sequences are each stored in a structured form, the structured form including the storage of subsets of each DNA sequence of predetermined length stored in as the edges of a tree structure along with prefix data including all possible prefixes for each such edge which have length less than said predetermined length, the method including the steps of: dividing the specified sequence into a plurality of sub-searches, the number of sub-searches being equal to the predetermined length of the subsets and each sub-search containing a prefix, a main and a postfix, by the steps of: setting the prefix of the first sub-search to be null, setting the prefix of the second sub-search to be the first item in the search string, and setting the prefix of the ith sub-search to be the first (i−1) items in the search string for all remaining sub-searches; setting the main of each sub-search to be those items in the search string immediately following the prefix which make up complete subsets of the search string, the subsets being of said predetermined length; setting the postfix of each sub-search to be those items at the end of the search string which are not included in the prefix or main or, if there are no such items, to be null, comparing each of the sub-searches to the stored data to determine if a match exists by, for each sub-search, until all sub-searches have been considered: determining if the main of the sub-search is contained in the subsets of any of the stored data strings, if it is not then moving to the next sub-search; if it is, determining if the prefix of the sub-search is contained in the prefix data stored for that subset, if it is not then moving to the next sub-search; if it is, determining if the postfix of the sub-search is contained in the stored data strings by: a) if the last data entry in the main of the sub-search matched a subset of a stored sequence that has subsequent subsets in the tree structure, determining if the postfix of the sub-search matches the initial part of any of those subsequent subsets; b) if the last data entry in the main of the sub-search matched a subset of a stored sequence that has no subsequent subsets in the tree structure, determining if the postfix of the sub-search matches the initial part of the postfix of that stored data string; or c) if the last data entry in the main of the sub-search matched part of a subset of a stored sequence, determining if the postfix of the sub-search matches the initial part of the remaining part of that subset. 3. A method of testing a pattern definition for a virus or other malware against known genuine data, wherein the genuine data is stored as a plurality of data strings each in a structured form, the structured form including the storage of subsets of each data string of predetermined length stored in as the edges of a tree structure along with prefix data including all possible prefixes for each such edge which have length less than said predetermined length, the method including the steps of: dividing the pattern definition into a plurality of sub-searches, the number of sub-searches being equal to the predetermined length of the subsets and each sub-search containing a prefix, a main and a postfix, by the steps of: setting the prefix of the first sub-search to be null, setting the prefix of the second sub-search to be the first item in the search string, and setting the prefix of the ith sub-search to be the first (i−1) items in the search string for all remaining sub-searches; setting the main of each sub-search to be those items in the search string immediately following the prefix which make up complete subsets of the search string, the subsets being of said predetermined length; setting the postfix of each sub-search to be those items at the end of the search string which are not included in the prefix or main or, if there are no such items, to be null, comparing each of the sub-searches to the stored data to determine if a match exists by, for each sub-search, until all sub-searches have been considered: determining if the main of the sub-search is contained in the subsets of any of the stored data strings, if it is not then moving to the next sub-search; if it is, determining if the prefix of the sub-search is contained in the prefix data stored for that subset, if it is not then moving to the next sub-search; if it is, determining if the postfix of the sub-search is contained in the stored data strings by: a) if the last data entry in the main of the sub-search matched a subset of a stored data string that has subsequent subsets in the tree structure, determining if the postfix of the sub-search matches the initial part of any of those subsequent subsets; b) if the last data entry in the main of the sub-search matched a subset of a stored data string that has no subsequent subsets in the tree structure, determining if the postfix of the sub-search matches the initial part of the postfix of that stored data string; or c) if the last data entry in the main of the sub-search matched part of a subset of a stored data string, determining i

Assignees

Inventors

Classifications

  • of sub-queries or views · CPC title

  • by using string matching techniques · CPC title

  • Browsing; Visualisation therefor (browsing or visualisation for clustering or classification G06F16/358) · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9715525B2 cover?
This invention relates to methods for storing and searching data. Embodiments of the invention make use of suffix trees to support binary pattern matching. Embodiments of the invention can be shown to have comparable search speeds to searches of known suffix trees, but are advantageous in that they have lower memory usage requirements which is important in large data environments.
Who is the assignee on this patent?
Khalifa Univ Of Science Tech And Res, British Telecomm, Emirates Telecommunications Corp, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/24535. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).