Framework supporting content delivery with reducer services network
US-9628344-B2 · Apr 18, 2017 · US
US11003692B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11003692-B2 |
| Application number | US-201514980572-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 28, 2015 |
| Priority date | Dec 28, 2015 |
| Publication date | May 11, 2021 |
| Grant date | May 11, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and non-transitory computer-readable media can obtain a first batch of content items to be clustered. A set of clusters can be generated by clustering respective binary hash codes for each content item in the first batch, wherein content items included in a cluster are visually similar to one another. A next batch of content items to be clustered can be obtained. One or more respective binary hash codes for the content items in the next batch can be assigned to a cluster in the set of clusters.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: obtaining, by a computing system, a first batch of content items to be clustered; selecting, by the computing system, a set of the first batch of content items to be a set of cluster centers; inserting, by the computing system, cluster center binary hash codes of the set of cluster centers into a multi-index hash table; generating, by the computing system, a set of clusters of visually similar content items in the first batch of content items based at least in part on first binary hash codes associated with the first batch of content items, wherein the generating is based on an assignment of a content item of the first batch of content items to a nearest cluster center based on a look up of a nearest cluster center binary hash code in the multi-index hash table; updating, by the computing system, the cluster center binary hash codes based on means of the first binary hash codes; obtaining, by the computing system, a second batch of content items to be clustered; assigning, by the computing system, at least a portion of second binary hash codes associated with the second batch of content items to clusters in the set of clusters based on distances between the second binary hash codes and cluster centers in the set of clusters; and generating, by the computing system, at least one new cluster based on unassigned second binary hash codes that are within a threshold distance to each other. 2. The computer-implemented method of claim 1 , wherein obtaining, by the computing system, the first batch of content items to be clustered further comprises at least one of: determining that a threshold number of content items have been received, or determining that a threshold period of time has elapsed. 3. The computer-implemented method of claim 1 , wherein generating, by the computing system, the set of clusters further comprises: generating the first binary hash codes associated with the first batch of content items; and determining, for each content item in the first batch of content items, a corresponding cluster from the set of clusters based on the look up of the nearest cluster center binary hash code in the multi-index hash table. 4. The computer-implemented method of claim 3 , wherein determining, for each content item in the first batch of content items, the corresponding cluster from the set of clusters further comprises: determining, for each content item in the first batch of content items, a respective distance between the first binary hash code for the content item and the nearest cluster center for the content item; and assigning each content item in the first batch of content items to the nearest cluster center for the content item based at least in part on the respective distance satisfying a threshold. 5. The computer-implemented method of claim 1 , wherein the set of cluster centers are randomly selected. 6. The computer-implemented method of claim 1 , wherein the set of clusters of visually similar content items is generated further based at least in part on a binary k-means clustering approach. 7. The computer-implemented method of claim 1 , the method further comprising: removing, from each cluster in the set of clusters, content items in the cluster that are at least a threshold distance from a cluster center corresponding to the cluster. 8. The computer-implemented method of claim 1 , the method further comprising: removing, from the set of clusters, clusters that are smaller than a threshold size. 9. The computer-implemented method of claim 1 , wherein assigning, by the computing system, the at least the portion of the second binary hash codes associated with the second batch of content items further comprises: determining a nearest cluster center for each content item in the second batch of content items; and determining, for each content item in the second batch of content items, that a respective distance between the second binary hash code for the content item and a binary hash code corresponding to the nearest cluster center for the content item satisfies a threshold. 10. The computer-implemented method of claim 1 , the method further comprising: determining that a content item is categorized as spam; determining a cluster in which the content item was clustered, the cluster including a plurality of other content items; and determining that the plurality of other content items in the cluster are spam. 11. A system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform: obtaining a first batch of content items to be clustered; selecting a set of the first batch of content items to be a set of cluster centers; inserting cluster center binary hash codes of the set of cluster centers into a multi-index hash table; generating a set of clusters of visually similar content items in the first batch of content items based at least in part on first binary hash codes associated with the first batch of content items, wherein the generating is based on an assignment of a content item of the first batch of content items to a nearest cluster center based on a look up of a nearest cluster center binary hash code in the multi-index hash table; updating the cluster center binary hash codes based on means of the first binary hash codes; obtaining a second batch of content items to be clustered; assigning at least a portion of secondary binary hash codes associated with the second batch of content items of clusters in the set of clusters based on distances between the second binary hash codes and cluster centers in the set of clusters; and generating at least one new cluster based on unassigned second binary hash codes that are within a threshold distance to each other. 12. The system of claim 11 , wherein obtaining the first batch of content items to be clustered further comprises at least one of: determining that a threshold number of content items have been received, or determining that a threshold period of time has elapsed. 13. The system of claim 11 , wherein generating the set of clusters further comprises: generating the first binary hash codes associated with the first batch of content items; and determining, for each content item in the first batch of content items, a corresponding cluster from the set of clusters based on the look up of the nearest cluster center binary hash code in the multi-index hash table. 14. The system of claim 13 , wherein determining, for each content item in the first batch of content items, the corresponding cluster from the set of clusters further comprises: determining, for each content item in the first batch of content items, a respective distance between the first binary hash code for the content item and the nearest cluster center for the content item; and assigning each content item in the first batch of content items to the nearest cluster center for the content item based at least in part on the respective distance satisfying a threshold. 15. The system of claim 11 , wherein the set of cluster centers are randomly selected. 16. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing system, cause the computing system to perform a method comprising: obtaining a first batch of content items to be clustered; selecting a set of the first batch of content items to be a set of cluster centers; inserting cluster center binary hash codes of the set of cluster centers
Machine learning · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Clustering or classification · CPC title
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.