Who is the assignee on this patent?

Hirsch Michael, Bitner Haim, Aronovich Lior, and 4 more

What technology area does this patent fall under?

Primary CPC classification G06F11/1453. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for efficient data searching, storage and reduction

US9400796B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9400796-B2
Application number	US-40777409-A
Country	US
Kind code	B2
Filing date	Mar 19, 2009
Priority date	Sep 15, 2004
Publication date	Jul 26, 2016
Grant date	Jul 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of searching a repository of binary uninterpretted data for a location of common data to an input data comprising: analyzing segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing step including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment. 2. The method of claim 1 , wherein the index is stored in a memory faster than a memory storing the repository itself. 3. The method of claim 2 , wherein the searching involves only the faster memory. 4. The method of claim 1 , wherein the searching time is independent of repository size. 5. The method of claim 1 , wherein the representation values are such that the number of matches indicates a degree of similarity. 6. The method of claim 1 , wherein the similarity searching includes a threshold number of matching representation values for a declared similarity. 7. The method of claim 6 , wherein the threshold varies in response to a statistical analysis of prior results of the searching step. 8. The method of claim 6 , including a step of verifying the declared similarity. 9. The method of claim 1 , wherein the index includes a location within the repository of the similarity matched portion. 10. The method of claim 6 , including a step of acting upon a declared similarity by matching the similar data repository data. 11. The method of claim 1 , including a step of data compression. 12. The method of claim 1 , including a step of updating the repository and the index. 13. The method of claim 1 , further comprising specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values; defining data intervals in each of the repository and input data based on the specified locations; wherein the analyzing the similar repository segment further comprises performing a binary difference process on the defined intervals, wherein sliding windows of the difference process for each of the repository and input data are at least sometimes positioned in non-matching offsets, wherein the data intervals defined for the repository data are of a different size than the data interval defined for the input data, wherein the index is stored in a memory faster than a memory storing the repository itself, wherein the similarity searching includes a threshold number of matching representation values for a declared similarity, wherein the threshold varies in response to a statistical analysis of prior results of the searching step. 14. A method of searching a repository of binary uninterpretted data for a location of common data to an input data comprising: analyzing segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing step including searching an index of representation values of the repository data for matching representation values of the input data in a time independent of a size of the repository and linear in a size of the input data; specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values; and analyzing the similar repository segment with respect to the input segment to determine their common data sections while utilizing the specified locations for data alignment, in a time linear in a size of the input segment. 15. The method of claim 14 , further comprising defining data intervals in each of the repository and input data based on the specified locations. 16. The method of claim 15 , wherein the analyzing the similar repository segment further comprises performing a binary difference process on the defined intervals. 17. The method of claim 16 , wherein sliding windows of the difference process for each of the repository and input data are at least sometimes positioned in non-matching offsets. 18. The method of claim 15 , wherein the data intervals defined for the repository data are of a different size than the data interval defined for the input data. 19. The method of claim 18 , wherein the data intervals defined for one of the repository data and the input data is one byte, while the data intervals for the other of the repository data and the input data is multiple bytes. 20. The method of claim 18 , wherein the index is stored in a memory faster than a memory storing the repository itself, wherein the similarity searching includes a threshold number of matching representation values for a declared similarity, wherein the threshold varies in response to a statistical analysis of prior results of the searching step.

Assignees

Inventors

Classifications

G06F16/1744
using compression, e.g. sparse files · CPC title
G06F16/2455
Query execution · CPC title
Y10S707/99953
Recoverability · CPC title
G06F2201/805
Real-time · CPC title
G06F16/2255
Hash tables · CPC title

Patent family

Related publications grouped by family.

View patent family 36035351

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9400796B2 cover?: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to det…
Who is the assignee on this patent?: Hirsch Michael, Bitner Haim, Aronovich Lior, and 4 more
What technology area does this patent fall under?: Primary CPC classification G06F11/1453. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).