Systems and methods for handling data

US9002777B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9002777-B1
Application numberUS-2393108-A
CountryUS
Kind codeB1
Filing dateJan 31, 2008
Priority dateJul 25, 2007
Publication dateApr 7, 2015
Grant dateApr 7, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for handling files to timely provide reports concerning the files is disclosed. The method may include crawling (or enumerating) the files, to figure out how many files/data are to be processed and/or how much processing work is to be performed. The method may also include processing the files in batches. Identification information (e.g., filenames, file paths, and/or object identifiers) pertaining to the files may be sent to one or more queues for batch processing of the files. The method may further include generating a report after processing of a batch among the batches is completed. The report may be generated before subsequent processing of a subsequent batch is completed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for handling a plurality of files, the method comprising: receiving a service request comprising a filter to exclude a portion of the plurality of files from processing; enumerating at least a portion of the files not excluded by the filter using at least one crawler of a set of crawlers, the set of crawlers including a first crawler and a second crawler, wherein enumerating includes using the first crawler or the second crawler to determine a number of files in the portion of the files not excluded by the filter and to determine an amount of processing work to process the number of files in the portion of the files not excluded by the filter; identifying a first set of files of the portion of the files not excluded by the filter; excluding the first set of files from a second set of files, the second set of files to be processed, the second set of files including a first batch of the files and a second batch of the files; submitting a first set of file identifiers associated with the first batch to a first queue; spawning a first set of service providers in a first set of nodes according to first workload associated with the first queue; processing, using the first set of service providers, the first batch of the portion of the files not excluded by the filter; submitting a second set of file identifiers associated with the second batch to at least one of the first queue and a second queue; spawning a second set of service providers in a second set of nodes according to second workload associated with the at least one of the first queue and the second queue; and processing, using the second set of service providers, the second batch of the portion of the files not excluded by the filter. 2. The method of claim 1 further comprising: providing a first report when the processing the first batch of files is completed; and providing a second report when the processing the second batch of files is completed. 3. The method of claim 1 further comprising: saving a state pertaining to the handling the plurality of files when all of the at least the portion of the files have been processed, the state includes at least information pertaining to a restart path for processing the at least the portion of the files. 4. The method of claim 1 further comprising: grouping, according to at least a time attribute, the plurality of files into a plurality of time interval file groups, the plurality of time interval file groups including at least a first time interval file group associated with a first time interval and a second time interval file group associated with a second time interval. 5. The method of claim 4 wherein the time attribute represent at least one of a creation time, a modification time, an access time, and a retention time. 6. The method of claim 4 wherein the first time interval and the second time interval have different lengths. 7. The method of claim 4 further comprising: using the first crawler to enumerate at least the first time interval file group; and using the second crawler to enumerate at least the second time interval me group. 8. The method of claim 4 wherein the plurality of file groups further includes a third time interval file group associated with a third time interval, the method further comprising skipping the third time interval file group without enumerating the third time interval file group. 9. The method of claim 4 further comprising: using at least a first node of the first set of nodes to process a first set of files, the first set of files belonging to both the first batch of the files and the first time interval file group; and using at least a second node of the first set of nodes to process a second set of files, the second set of files belonging to both the first batch of the files and the second time interval file group. 10. The method of claim 4 wherein the plurality of file groups further includes a third time interval file group associated with a third time interval, the method further comprising skipping the third time interval file group without processing the third time interval file group. 11. The method of claim 4 further comprising mapping the plurality of time interval file groups and the plurality of files into a tree structure. 12. The method of claim 1 further comprising removing one or more links among the at least the portion of the files to form a tree structure for the at least the portion of the files. 13. The method of claim 12 further comprising performing the enumerating on the tree structure. 14. The method of claim 12 further comprising: mapping the tree structure into a list structure; and performing the enumerating on the list structure. 15. The method of claim 12 further comprising: segmenting the tree structure into a plurality of directories, the plurality of directories including at least a first directory and a second directory; providing the first directory to a first node; providing the second directory to a second node; using the first crawler to enumerate at least a portion of the first directory; and using the second crawler to enumerate at least a portion of the second directory. 16. The method of claim 15 further comprising: transferring a subdirectory of the first directory from the first node to the second node; and using the second crawler to enumerate the subdirectory of the first directory. 17. The method of claim 1 wherein the first batch of the files includes a first set of files that have changes according to one or more file system snapshots, the first batch of the files excluding a second set of files that have no changes according to the one or more file system snapshot. 18. The method of claim 17 wherein the changes include at least one change in at least one directory name pertaining to at least one file in the first set of files. 19. The method of claim 1 wherein the first set of identifiers represents changes in the first batch of the files according to one or more file system snapshots. 20. The method of claim 1 further comprising: creating a first checkpoint when a first set of files has been processed, the first checkpoint representing first state information pertaining to the at least the portion of the files; creating a second checkpoint when the first set of files and a second set of files have been processed, the second checkpoint representing second state information pertaining to the at least the portion of the files; and replacing the first checkpoint with the second checkpoint as an effective checkpoint for the at least the portion of the files. 21. The method of claim 20 further comprising discarding a previous checkpoint after the second checkpoint has been created, the previous checkpoint created prior to creation of the first checkpoint. 22. The method of claim 20 further comprising retaining the first checkpoint after the second checkpoint has been created. 23. The method of claim 20 further comprising discarding the first checkpoint after the second checkpoint has been created. 24. The method of claim 20 further comprising maintaining a plurality of checkpoints in a rolling fashion. 25. The method of claim 20 wherein the second set of files includes a first file and a second file, the first file enumerated before the second file, the first file processed after the second file. 26. The method of cla

Assignees

Inventors

Classifications

  • G06Q10/00Primary

    Administration; Management · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/119Primary

    Details of migration of file systems (migration mechanisms in storage systems G06F3/0647) · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9002777B1 cover?
A method for handling files to timely provide reports concerning the files is disclosed. The method may include crawling (or enumerating) the files, to figure out how many files/data are to be processed and/or how much processing work is to be performed. The method may also include processing the files in batches. Identification information (e.g., filenames, file paths, and/or object identifier…
Who is the assignee on this patent?
Muddu Sudhakar, Tryfonas Christos, Maunder Anurag, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06Q10/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).