Three-dimensional latent semantic analysis

US9734144B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9734144-B2
Application numberUS-201414891810-A
CountryUS
Kind codeB2
Filing dateSep 18, 2014
Priority dateSep 18, 2014
Publication dateAug 15, 2017
Grant dateAug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some examples, a computing system may access multiple information files, generate term-passage matrix data based on the multiple information files, and decompose the term-passage matrix data to generate a reduced-dimensional semantic space, which may be used for information retrieval.

First claim

Opening claim text (preview).

We claim: 1. A method, comprising: accessing, by one or more processors, a plurality of information files; generating, by the one or more processors, term-passage matrix data based on the plurality of information files; decomposing the term-passage matrix data to generate a reduced-dimensional semantic space, wherein a number of rows of the term-passage matrix data corresponds to a sum of a number of distinct words in the plurality of information files and a number of distinct word pairs in the plurality of information files, wherein a number of columns of the term-passage matrix data corresponds to a number of passages in the plurality of information files, and wherein the term-passage matrix data indicates a frequency of occurrence of each individual word of the distinct words in the plurality of information files and a frequency of occurrence of each individual word pair of the distinct word pairs in the plurality of information files; responsive to a query, determining a pseudo object associated with the query in the reduced-dimensional semantic space; examining one or more similarities between the pseudo object and words in the plurality of information files in the reduced-dimensional semantic space; and determining a passage from the plurality of information files based on the one or more similarities. 2. The method of claim 1 , wherein a semantic distance of an individual word pair of the distinct word pairs is selected based on a particular condition associated with computational complexity of decomposition of the term-passage matrix data. 3. The method of claim 1 , wherein an individual word pair of the distinct word pairs is a non-consecutive word pair. 4. The method of claim 1 , wherein the decomposing comprises decomposing using a singular value decomposition (SVD) approach. 5. The method of claim 4 , further comprising: determining the number of the distinct word pairs based on an application of an algorithm associated with the SVD approach using each value of 1, 2 1 , 2 2 , . . . 2 i in turn such that the algorithm produces a set of eigenvalues that comprises a previous set of eigenvalues to a particular extent, wherein i is a non-zero integer. 6. The method of claim 5 , wherein the algorithm comprises an algorithm associated with an iterative method or an eigenvalue algorithm. 7. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions executable by one or more processors to perform operations comprising: generating term-passage matrix data to represent a plurality of information files, wherein the term-passage matrix data indicates a frequency of occurrence of each individual word of a plurality of distinct words in the plurality of information files and further indicates a frequency of occurrence of each individual word combination of a plurality of distinct word combinations in the plurality of information files; decomposing the term-passage matrix data to generate a reduced-dimensional semantic space; in response to a query, determining a pseudo object associated with the query in the reduced-dimensional semantic space; examining one or more similarities between the pseudo object and words in the plurality of the information files in the reduced-dimensional semantic space; and determining a passage from the plurality of information files based on the one or more similarities. 8. The non-transitory computer-readable storage medium of claim 7 , wherein a number of rows of the term-passage matrix data corresponds to a sum of a number of the plurality distinct words in the plurality of information files and a number of the plurality of distinct word combinations in the plurality of information files, and wherein a number of columns of the term-passage matrix data corresponds to a number of passages in the plurality of information files. 9. The non-transitory computer-readable storage medium of claim 7 , wherein the plurality of distinct word combinations comprises a plurality of distinct word pairs. 10. The non-transitory computer-readable storage medium of claim 7 , further comprising: responsive to the query, translating the query into a vector representation using the reduced-dimensional semantic space; and comparing the vector representation of the query and vector representation of one or more passages in the plurality of information files in the reduced-dimensional semantic space. 11. The non-transitory computer-readable storage medium of claim 7 , further comprising at least partially causing at least one of: a summarization of the plurality of information files using the reduced-dimensional semantic space, a document comparison between a file and the plurality of information files using the reduced-dimensional semantic space, or a domain specific search using the reduced-dimensional semantic space. 12. An apparatus, comprising: one or more processors; and a memory configured to store a plurality of components executable by the one or more processors, the plurality of components comprising: an information accessing module configured to access a plurality of information files; a latent semantic analysis (LSA) module configured to: generate term-passage matrix data based on the plurality of information files, wherein a number of rows of the term-passage matrix data corresponds to a sum of a number of distinct words in the plurality of information files and a number of distinct word pairs in the plurality of information files, wherein a number of columns of the term-passage matrix data corresponds to a number of passages in the plurality of information files, and wherein the term-passage matrix data indicates a frequency of occurrence of each individual word of the distinct words in the plurality of information files and a frequency of occurrence of each individual word pair of the distinct word pairs in the plurality of information files; and generate a reduced-dimensional semantic space based on the term-passage matrix data; and an information retrieval module configured to: responsive to a query, determine a pseudo object associated with the query in the reduced-dimensional semantic space; examine one or more similarities between the pseudo object and words in the plurality of information files in the reduced-dimensional semantic space; and determine a passage in the plurality of information files based on the one or more similarities. 13. The apparatus of claim 12 , wherein the LSA module is further configured to generate the reduced-dimensional semantic space based on the term-passage matrix data by decomposition of the term-passage matrix data to generate the reduced-dimensional semantic space by use of a singular value decomposition (SVD) approach. 14. The apparatus of claim 13 , wherein the LSA module is further configured to determine the number of distinct word pairs based on an application of an algorithm associated with the SVD approach by use of each value of 1, 2 1 , 2 2 , . . . 2 i in turn such that the algorithm produces a set of eigenvalues that comprises a previous set of eigenvalues to a particular extent, and wherein i is a non-zero integer. 15. The apparatus of claim 12 , wherein a semantic distance of an individual word pair of the distinct word pairs is selected based on a particular condition associated with computational complexity of decomposition of the term-passage matrix data. 16. The apparatus of claim 12 , wherein the plurality of components further comprises at least one of: a summarizing module configured to at least partially cause a summarization of the plurality o

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9734144B2 cover?
In some examples, a computing system may access multiple information files, generate term-passage matrix data based on the multiple information files, and decompose the term-passage matrix data to generate a reduced-dimensional semantic space, which may be used for information retrieval.
Who is the assignee on this patent?
Dasc{Hacek Over (A)}Lu Mihai, Ash David Walter, Empire Technology Dev Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).