Search infrastructure

US11580176B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11580176-B2
Application numberUS-202016877452-A
CountryUS
Kind codeB2
Filing dateMay 18, 2020
Priority dateAug 17, 2012
Publication dateFeb 14, 2023
Grant dateFeb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for real-time search, including: a set of partitions, each including a set of segments, each segment corresponding to a time slice of messages posted to the messaging platform, and a real-time search engine configured to receive a search term in parallel with other partitions in set the set of partitions, and search at least one of the set of segments in reverse chronological order of the corresponding time slice to identify document identifiers of messages containing the search term; and a search fanout module configured to: receive a search query including the search term; send the search term to each of the set of partitions for parallel searching; and return, in response to the search query, at least one of the identified document identifiers of messages containing the search term.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a computer processor; a partition of a data repository, the partition comprising: a first segment comprising a first time slice of documents that were added to the data repository at respective times during a first time period corresponding to the first time slice; a query cache associated with the first segment and comprising, for each document in the first time slice, a respective value for a first binary attribute, wherein the value for the first binary attribute identifies a classification for the document; and a real-time search engine executing on the computer processor and configured to: receive a search request comprising a first binary search term; search the query cache in reverse chronological order of the first time period, based on the respective times at which the documents in the first time slice were added to the data repository, comprising matching the first binary search term to a particular value of the first binary attribute; generate, based on searching the query cache, a result set comprising respective document identifiers of a subset of the first time slice of documents, wherein each document in the subset has the particular value of the first binary attribute that matches the first binary search term; and return the result set in response to the search request. 2. The system of claim 1 , wherein the real-time search engine is further configured to: calculate relevance scores for at least a portion of the first time slice of documents, wherein the relevance scores are calculated based on timeliness of the portion of the first time slice of documents, and wherein the subset of the first time slice of documents is selected for inclusion in the result set based on the calculated relevance scores. 3. The system of claim 1 , wherein: the query cache further comprises, for each document in the first time slice, a respective value for a second binary attribute, wherein the value for the second binary attribute identifies a second classification for the document; the search request further comprises a second binary search term; searching the query cache in reverse chronological order of the first time period further comprises matching the second binary search term to a particular value of the second binary attribute; and each document in the subset has the particular value of the second binary attribute that matches the second binary search term. 4. The system of claim 1 , wherein: the partition further comprises a second segment comprising a second time slice of documents that were added to the data repository at respective times during a second time period corresponding to the second time slice; the query cache is further associated with the second segment and the query cache comprises, for each document in the second time slice, a respective value for the first binary attribute; and the real-time search engine is further configured to search the query cache in reverse chronological order of the second time period, based on the respective times at which the documents in the second time slice were added to the data repository, wherein the result set further comprises respective document identifiers of a subset of the second time slice of documents. 5. The system of claim 1 , further comprising: a plurality of partitions comprising the partition; and a search fanout module configured to: receive the search request comprising the first binary search term; send the search request to the plurality of partitions for parallel searching; receive a plurality of result sets from the plurality of partitions, wherein the plurality of result sets comprises the result set; generate a final result set comprising document identifiers from the plurality of result sets; and return the final result set in response to the search request. 6. The system of claim 1 , wherein: the partition comprises a single writer thread; and searching the query cache further comprises: identifying a last update identifier indicating a last update point of the query cache; identifying a last written document identifier designating a position of the single writer thread of the partition; identifying, based on the last update identifier and the last written document identifier, a stale portion of the query cache corresponding to a fresh portion of a postings list of the first segment; refreshing the stale portion of the query cache; and determining a safe search range of the postings list, wherein the refreshed portion is within the safe search range. 7. The system of claim 1 , wherein: the first binary attribute comprises one selected from a group consisting of a top contributor flag, a top document flag, a spam flag, an includes image flag, an includes video flag, an includes news flag, an includes pornography flag, and an includes antisocial user flag. 8. A method comprising: receiving a search request comprising a first binary search term; identifying respective document identifiers of a first time slice of documents, wherein the first time slice of documents is stored in a first segment of a partition of a data repository, and wherein the documents in the first time slice were added to the data repository at respective times during a first time period corresponding to the first time slice; accessing, by a computer processor, a query cache associated with the first segment and comprising, for each document in the first time slice, a respective value for a binary attribute, wherein the value for the binary attribute identifies a classification for the document; searching, by the computer processor, the query cache in reverse chronological order of the first time period, based on the respective times at which the documents in the time slice were added to the data repository, comprising matching the first binary search term to a particular value of the first binary attribute; generating, based on searching the query cache, a result set comprising the respective document identifiers of a subset of the first time slice of documents, wherein each document in the subset has the particular value of the first binary attribute that matches the first binary search term; and returning the result set in response to the search request. 9. The method of claim 8 , further comprising calculating relevance scores for at least a portion of the first time slice of documents, wherein the relevance scores are calculated based on timeliness of the portion of the first time slice of documents, and wherein the subset of the first time slice of documents is selected for inclusion in the result set based on the calculated relevance scores. 10. The method of claim 8 , wherein: the query cache further comprises, for each document in the first time slice, a respective value for a second binary attribute, wherein the value for the second binary attribute identifies a second classification for the document; the search request further comprises a second binary search term; searching the query cache in reverse chronological order of the first time period further comprises matching the second binary search term to a particular value of the second binary attribute; and each document in the subset has the particular value of the second binary attribute that matches the second binary search term. 11. The method of claim 8 , wherein: the partition further comprises a second segment comprising a second time slice of documents that were added to the data repository at respective times during a second time period corresponding to the second time slice; the query cache is further associated with the second segment and the query cache comprises, for

Assignees

Inventors

Classifications

  • by using parallel associative memories or content-addressable memories · CPC title

  • using ranking · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Data stream processing; Continuous queries · CPC title

  • G06F16/953Primary

    Querying, e.g. by the use of web search engines · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580176B2 cover?
A system for real-time search, including: a set of partitions, each including a set of segments, each segment corresponding to a time slice of messages posted to the messaging platform, and a real-time search engine configured to receive a search term in parallel with other partitions in set the set of partitions, and search at least one of the set of segments in reverse chronological order of …
Who is the assignee on this patent?
Twitter Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).