On-demand, dynamic and optimized indexing in natural language processing
US-10949409-B2 · Mar 16, 2021 · US
US11675769B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11675769-B2 |
| Application number | US-202117174085-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 11, 2021 |
| Priority date | Dec 13, 2017 |
| Publication date | Jun 13, 2023 |
| Grant date | Jun 13, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Indexing natural language processing, a request is received from a user to access a document at a server, the server routes the request to an indexing server. A validation service checks if the CUID of the document is available in the indexing server repository or a file system associated with the indexing server. If the CUID of dataset exists, determine if a timestamp of the new document matches the timestamp of the previously indexed document. Upon determining that the above conditions are fulfilled, the previously indexed data is returned to the server. If it is determined that the above conditions do not match, then a transformation service is invoked at the indexing server. The transformation service compares a hash value of a dataset. If the transformation service determines that the hash value of a dataset in the document is not available, an indexing service is invoked to index the document.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable medium to store instructions, which when executed by a computer, cause the computer to perform operations comprising: route a query received at a server to an indexing server, wherein the query is generated by transforming a natural language input, and wherein the query is for accessing a document; in response to determining that the document is indexed and available in the indexing server: retrieve the document from the indexing server, wherein the determination is performed by services that are executed at the indexing server as background tasks; in response to executing logic implemented at a validation service, determine whether a collision resistant unique identifier (CUID) of the document is present in the indexing server; and in response to determining that the CUID of the document is present in the indexing server, determine whether a last modified timestamp of the document matches a timestamp of the document available at the indexing server; in response to the query and determining that the last modified timestamp of the document matches the timestamp of the document available at the indexing server, reuse the document previously indexed and available at the indexing server; and provide the document to the server from the indexing server for accessing the document the server. 2. The computer-readable medium of claim 1 , further comprises instructions which when executed by the computer cause the computer to perform operations comprising: in response to determining that the document is not available in the indexing server: determining whether a hash of a dataset of the document is available in the indexing server based on executing logic implemented at a transformation service; in response to determining that the hash of the dataset of the document is available in the indexing server, comparing metadata associated with the dataset with metadata previously stored at the indexing server; and in response to determining that the metadata associated with the dataset is similar to a subset of the metadata previously stored at the indexing server, reusing the dataset from the document at the indexing server; and providing a document including the dataset indexed at the indexing server to the server for accessing. 3. The computer-readable medium of claim 2 , further comprises instructions which when executed by the computer further cause the computer to: in response to determining that the hash of the dataset is not available in the indexing server: invoke an indexing service; index metadata associated with the dataset by a metadata indexing thread; index a list of values corresponding to the metadata in the dataset by a list of value indexing thread; and store the CUID of the document and the timestamp of the document on the indexing server. 4. The computer-readable medium of claim 3 , wherein the hash of the dataset is based on a data source type and a dataset path. 5. The computer-readable medium of claim 4 , further comprises instructions which when executed by the computer further cause the computer to: in response to determining that the dataset in the document is not accessed, pause indexing the dataset; and in response to determining that the dataset in the document is accessed, resume indexing the dataset. 6. The computer-readable medium of claim 5 , further comprises instructions which when executed by the computer further cause the computer to: in response to determining that a first dataset in the document is accessed, identify the first dataset as an active dataset; in response to determining that a second dataset in the document is not accessed, identify the second dataset as an inactive dataset; and index the first dataset in the indexing server and not the second dataset. 7. A computer-implemented method of optimized indexing in natural language processing, the method comprising: routing a query received at a server to an indexing server, wherein the query is generated by transforming a natural language input, and wherein the query is for accessing a document; in response to determining that the document is indexed and available in the indexing server: retrieving the document from the indexing server, wherein the determination is performed by services that are executed at the indexing server as background tasks; in response to executing logic implemented at a validation service, determining whether a collision resistant unique identifier (CUID) of the document is present in the indexing server; and in response to determining that the CUID of the document is present in the indexing server, determining whether a last modified timestamp of the document matches a timestamp of the document available at the indexing server; in response to the query and determining that the last modified timestamp of the document matches the timestamp of the document available at the indexing server, reusing the document previously indexed and available at the indexing server; and providing the document to the server from the indexing server for accessing the document the server. 8. The method of claim 7 , further comprising: in response to determining that the document is not available in the indexing server: determining whether a hash of a dataset of the document is available in the indexing server based on executing logic implemented at a transformation service; in response to determining that the hash of the dataset of the document is available in the indexing server, comparing metadata associated with the dataset with metadata previously stored at the indexing server; and in response to determining that the metadata associated with the dataset is similar to a subset of the metadata previously stored at the indexing server, reusing the dataset from the document at the indexing server; and providing a document including the dataset indexed at the indexing server to the server for accessing. 9. The method of claim 8 , further comprising: in response to determining that the hash of the dataset is not available in the indexing server: invoking an indexing service; indexing metadata associated with the dataset by a metadata indexing thread; indexing a list of values corresponding to the metadata in the dataset by a list of value indexing thread; and storing the CUID of the document and the timestamp of the document on the indexing server. 10. The method of claim 9 , wherein the hash of the dataset is based on a data source type and a dataset path. 11. The method of claim 10 , further comprising: in response to determining that the dataset in the document is not accessed, pausing indexing the dataset; and in response to determining that the dataset in the document is accessed, resuming indexing the dataset. 12. The method of claim 11 , further comprising: in response to determining that a first dataset in the document is accessed, identifying the first dataset as an active dataset; in response to determining that a second dataset in the document is not accessed, identifying the second dataset as an inactive dataset; and indexing the first dataset in the indexing server and not the second dataset. 13. A computer system for optimized indexing in natural language processing, comprising: a computer memory to store program code; and a processor to execute the program code to: route a query received at a server to an indexing server, wherein the query is generated by transforming a natural language input, and wherein the query is for accessing a document; in response to determining that the document is indexed and available in t
Search customisation based on user profiles and personalisation · CPC title
Document management systems · CPC title
Temporal data queries · CPC title
Indexing; Data structures therefor; Storage structures (for retrieval from the web G06F16/951) · CPC title
Indexing structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.