Multi-user search system with methodology for bypassing instant indexing
US-9792315-B2 · Oct 17, 2017 · US
US11157477B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11157477-B2 |
| Application number | US-201816202215-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 28, 2018 |
| Priority date | Nov 28, 2018 |
| Publication date | Oct 26, 2021 |
| Grant date | Oct 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer system, and computer program product for segment differential-based document text-index modeling are provided. The embodiment may include receiving, by a processor, a document with a valid document ID and version ID tuple. The embodiment may also include determining the received document is a new version of a previously stored document and consequently multiplexing versions of the document into a single indexed document. The embodiment may further include segmenting the received document and building a token vector. The embodiment may also include calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation. The embodiment may further include in response to the calculated difference being below a pre-configured threshold value, discarding the received new version.
Opening claim text (preview).
What is claimed is: 1. A processor-implemented method for storing and processing a query in a document corpus utilizing a segment level differential document text-index model, the method comprising: receiving, by a processor, a document with a valid document ID and version ID tuple; determining the received document is a new version of a previously stored document and, consequently, multiplexing multiple versions of the document into a single indexed document; segmenting the received document and building a token vector; calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation; in response to the calculated difference being below a pre-configured threshold value, discarding the received new version; in response to the calculated difference exceeding the pre-configured threshold value, updating the token vector and an index with a token stream of the received new version of the document; executing a query with terms to receive hits across all versions of the document, wherein posting lists for a particular term and payloads of each position from the all versions of the documents are loaded, wherein token positions that are shared among the all versions of the document are analyzed to search for the particular term; and displaying a result set to a user indicating that the particular term was found among the all versions of the document. 2. The method of claim 1 , wherein the difference is calculated based on cumulative differentials between segments of the current version of the document and the previous version of the document. 3. The method of claim 1 , wherein the difference is calculated based on semantic differences between segments of the current version and the previous version of the document. 4. The method of claim 1 , wherein the difference is calculated based on counting a total number of words in each version of the document. 5. The method of claim 1 , wherein the difference is measured based on a comparison of each token vector of each document. 6. The method of claim 1 , further comprising: generating a version-specific payload for each token which indicates when a new term is detected in the new version. 7. The method of claim 1 , further comprising: modifying a term string when adding the token stream to an index by placing special characters or marks at the end of a token to indicate the term string is valid after a preconfigured number of document versions. 8. The method of claim 1 , further comprising: storing token vectors as separate data when a difference between two token streams exceeds a pre-configured threshold value. 9. The method of claim 1 , further comprising: loading a posting list of a term and a term with special marks or characters when a user searches for a term in specified versions of a document; loading a payload of each term indicating each position and version; and using version information to filter a requisite query result. 10. A computer system for avoiding a high object version explosion in processing a query in a document utilizing a segment differential-based document text-index modeling, the computer system comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more tangible storage media for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: receiving, by a processor, a document with a valid document ID and version ID tuple; determining the received document is a new version of a previously stored document and, consequently, multiplexing multiple versions of the document into a single indexed document; segmenting the received document and building a token vector; calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation; in response to the calculated difference being below a pre-configured threshold value, discarding the received new version; in response to the calculated difference exceeding the pre-configured threshold value, updating the token vector and an index with a token stream of the received new version of the document; executing a query with terms to receive hits across all versions of the document, wherein posting lists for a particular term and payloads of each position from the all versions of the documents are loaded, wherein token positions that are shared among the all versions of the document are analyzed to search for the particular term; and displaying a result set to a user indicating that the particular term was found among the all versions of the document. 11. The computer system of claim 10 , wherein the difference is calculated based on cumulative differentials between segments of the current version of the document and the previous version of the document. 12. The computer system of claim 10 , wherein the difference is calculated based on semantic differences between segments of the current version and the previous version of the document. 13. The computer system of claim 10 , wherein the difference is calculated based on counting a total number of words in each version of the document. 14. The computer system of claim 10 , wherein the difference is measured based on a comparison of each token vector of each document. 15. The computer system of claim 10 , further comprising: generating a version-specific payload for each token which indicates when a new term is detected in the new version. 16. The computer system of claim 10 , further comprising: modifying a term string when adding the token stream to an index by placing special characters or marks at the end of a token to indicate the term string is valid after a preconfigured number of document versions. 17. The computer system of claim 10 , further comprising: storing token vectors as separate data when a difference between two token streams exceeds a pre-configured threshold value. 18. The computer system of claim 10 , further comprising: loading a posting list of a term and a term with special marks or characters when a user searches for a term in specified versions of a document; loading a payload of each term indicating each position and version; and using version information to filter a requisite query result. 19. A computer program product for avoiding a high object version explosion in processing a query in a document utilizing a segment differential-based document text-index modeling, the computer program product comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more tangible storage media, the program instructions executable by a processor of a computer to perform a method, the method comprising: receiving, by a processor, a document with a valid document ID and version ID tuple; determining the received document is a new version of a previously stored document and, consequently, multiplexing multiple versions of the document into a single indexed document; segmenting the received document and building a token vector; calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation; in response to the calculated difference being below
Selection or weighting of terms for indexing · CPC title
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.