Search infrastructure

US10878042B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10878042-B2
Application numberUS-201314422150-A
CountryUS
Kind codeB2
Filing dateAug 16, 2013
Priority dateAug 17, 2012
Publication dateDec 29, 2020
Grant dateDec 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for real-time search, including: a set of partitions, each including a set of segments, each segment corresponding to a time slice of messages posted to the messaging platform, and a real-time search engine configured to receive a search term in parallel with other partitions in the set of partitions, and search at least one of the set of segments in reverse chronological order of the corresponding time slice to identify document identifiers of messages containing the search term; and a search fanout module configured to: receive a search query including the search term; send the search term to each of the set of partitions for parallel searching; and return, in response to the search query, at least one of the identified document identifiers of messages containing the search term.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for real-time search in a messaging platform, comprising: one or more computers including one or more a computer processors and one or more storage devices, the one or more computers being configured to provide; a fanout module configured to: receive a search query comprising one or more search terms; provide the search query to each of a plurality of partitions in parallel; and receive, from one or more of the partitions in response to the search query, one or more document identifiers corresponding to messages containing one or more of the search terms; the plurality of partitions, each partition comprising: a set of segments, wherein each segment of the set of segments stores a postings list representing messages broadcast to the messaging platform during a particular time slice defining a particular time range in which the messages represented by the segment were broadcast, wherein the messages were broadcast by respective user accounts of the messaging platform, wherein each segment of the partition corresponds to a different time slice; and a real-time search engine configured to: receive the one or more terms of the search query in parallel with search engines of the other partitions in the plurality of partitions; and search at least one segment of the set of segments, wherein the at least one segment is searched in reverse chronological order according to the time range specified by the time slice of each segment of the partition such that the segment with the most recent time range is searched first, wherein the search is performed to identify document identifiers of messages from the postings list containing one or more of the search terms. 2. The system of claim 1 , wherein the real-time search engine is further configured to: calculate a relevance score for each of the identified document identifiers; rank the document identifiers in order of the calculated relevance scores; and send a highest ranked subset of the document identifiers to the search fanout module, and wherein the one or more document identifiers are selected from the highest ranked subsets sent from each of the plurality of partitions. 3. The system of claim 2 , wherein the search fanout module further comprises functionality to: receive the highest ranked subsets of the document identifiers from each of the plurality of partitions; and select the one or more document identifiers from the highest ranked subsets based on the calculated relevance scores. 4. The system of claim 2 , wherein the relevance score for each of the document identifiers is calculated based on a set of linear weights associated with the document identifier and a set of non-linear weights associated with the document identifier. 5. The system of claim 4 , wherein the relevance score for each of the document identifiers is calculated using the following formula: score( t )=Σ L t *ΠB t , wherein t is the document identifier, wherein Lt is the set of linear weights associated with the document identifier, and wherein Bt is the set of non-linear weights associated with the document identifier. 6. The system of claim 1 , wherein each of the plurality of partitions further comprises: a query cache comprising a set of binary attributes for each document identifier in the set of segments of the partition, and wherein the real-time search engine is further configured to: receive a binary attribute with the search term, wherein searching at least one segment of the set of segments to identify the document identifiers is limited to the entries having the binary attribute, wherein the binary attribute is one selected from a group consisting of a top contributor flag, a top message flag, a spam flag, an includes image flag, an includes video flag, and an includes news flag. 7. The system of claim 1 , further comprising a message ingester configured to: receive a request to index a new message broadcasted by the messaging platform; select a partition of the plurality of partitions for indexing the new message; and send a document identifier of the new message to the selected partition for inclusion in a current time slice of the partition. 8. The system of claim 7 , wherein each of the plurality of partitions further comprises a single writer thread configured to: select an oldest segment of the set of segments corresponding to an oldest time slice; and overwrite the oldest segment with document identifiers broadcast during the current time slice. 9. A method for real-time search in a messaging platform, comprising: receiving a search query comprising one or more search terms; sending the search query to each of a plurality of partitions for parallel searching, wherein each partition of the plurality of partitions comprises a set of segments, and wherein each segment of the set of segments stores a postings list representing messages broadcast to the messaging platform during a particular time slice defining a particular time range in which the messages represented by the segment were broadcast, wherein the messages were broadcast by respective user accounts of the messaging platform, wherein each segment of the partition corresponds to a different time slice; for each partition, in parallel with other partitions in the plurality of partitions: searching, using a computer processor, at least one segment of the set of segments of the partition, wherein the at least one segment is searched in reverse chronological order according to the time range specified by the time slice of each segment of the partition such that the segment with the most recent time range is searched first, wherein the search is performed to identify one or more document identifiers of messages from the postings list containing one or more of the search terms; and returning, in response to the search query, at least one of the identified document identifiers of messages containing the search term. 10. The method of claim 9 , further comprising: for each partition, in parallel with other partitions in the plurality of partitions: calculating a relevance score for each of the identified document identifiers; ranking the document identifiers in order of the calculated relevance scores; and sending a highest ranked subset of the document identifiers to a search fanout module; and wherein the at least one document identifier is selected from the highest ranked subsets sent from each of the plurality of partitions. 11. The method of claim 10 , further comprising: receiving the highest ranked subsets of the document identifiers from each of the plurality of partitions; and selecting the at least one document identifier from the highest ranked subsets based on the calculated relevance scores. 12. The method of claim 9 , wherein: each of the plurality of partitions comprises a query cache comprising a set of binary attributes for each document identifier in the set of segments of the partition; and the method further comprises receiving a binary attribute with the search term, wherein searching the at least one segment to identify the document identifiers is limited to the entries having the binary attribute. 13. The method of claim 12 , wherein each partition of the plurality of partitions comprises only a single writer thread, and wherein the method further comprises: identifying a last update identifier indicating a last update point of the query cache; identifying a last written document identifier designating a position of the single writer thread of the partition; identifying, based on the last update identifier and the last written document

Assignees

Inventors

Classifications

  • by using parallel associative memories or content-addressable memories · CPC title

  • Data stream processing; Continuous queries · CPC title

  • using ranking · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • G06F16/953Primary

    Querying, e.g. by the use of web search engines · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10878042B2 cover?
A system for real-time search, including: a set of partitions, each including a set of segments, each segment corresponding to a time slice of messages posted to the messaging platform, and a real-time search engine configured to receive a search term in parallel with other partitions in the set of partitions, and search at least one of the set of segments in reverse chronological order of the …
Who is the assignee on this patent?
Twitter Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).