Computer-implemented method, device, and computer program product

US12182516B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12182516-B2
Application numberUS-202117527798-A
CountryUS
Kind codeB2
Filing dateNov 16, 2021
Priority dateOct 21, 2021
Publication dateDec 31, 2024
Grant dateDec 31, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a computer-implemented method, a device, and a computer program product. The method includes extracting respective themes of a set of documents with release time within a first period; determining respective semantic information of the themes and frequencies of the themes appearing in the set of documents; and determining the number of documents associated with the themes within a second period according to a prediction model and based on the semantic information and frequencies of the themes. The second period is after the first period. Embodiments of the present disclosure can better predict the tendency of the themes appearing in the future based on the semantic information and frequencies of the themes.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: extracting respective themes of a set of documents with release time within a first period; determining respective semantic information of the themes and frequencies of the themes appearing in the set of documents utilizing at least a first machine learning model, the first machine learning model comprising a transformer-based model; and determining the number of documents associated with the themes within a second period according to a second machine learning model, different than the first machine learning model, the second machine learning model comprising a time sequence prediction model and based on the semantic information and frequencies of the themes, wherein the second period is after the first period; wherein the first machine learning model is configured to encode the semantic information of the themes and the frequencies of the themes appearing in the set of documents; wherein results of the encoding of the semantic information of the themes and the frequencies of the themes comprises, for each of at least a subset of the themes, (i) a semantic encoding component comprising at least a classification token for the semantic information of the theme, and (ii) a frequency encoding component comprising a plurality of frequency-related values generated for the frequency of appearance of the theme based on a designated probability function; wherein the second machine learning model is configured to process the semantic encoding components and the frequency encoding components to determine the number of documents associated with the themes within the second period; wherein determining the frequencies comprises determining a time sequence of frequency representations of the themes within the first period; wherein determining the time sequence of frequency representations comprises: for a time interval point within the first period, determining a frequency representation of the time sequence of frequency representations at the time interval point based on the number of documents corresponding to the themes; and wherein determining the frequency representation at the time interval point comprises: determining the frequency representation by using a position extending code based on the number of documents corresponding to the themes. 2. The method according to claim 1 , wherein determining the semantic information comprises: determining a time sequence of semantic representations of the themes within the first period. 3. The method according to claim 2 , wherein determining the time sequence of semantic representations comprises: for a time interval point within the first period, determining a semantic representation of the time sequence of semantic representations at the time interval point according to a semantic encoding model and based on words or words in phrases corresponding to the themes in documents with release time not later than the time interval point in the set of documents. 4. The method according to claim 1 , wherein determining the frequency representation of the time sequence of frequency representations at the time interval point is based on the number of documents corresponding to the themes in documents with release time not later than the time interval point in the set of documents. 5. The method according to claim 1 , wherein the frequency representation has a predefined dimension which is greater than one dimension. 6. The method according to claim 1 , wherein extracting the respective themes of the set of documents comprises: extracting a predefined number of respective themes of the set of documents by using a theme classifying model. 7. The method according to claim 1 , wherein determining the number of the documents associated with the themes within the second period comprises: determining a number time sequence of the themes within the second period, wherein the number time sequence comprises the number of documents associated with the themes at each time interval point within the second period. 8. An electronic device, comprising: at least one processor; and at least one memory storing computer program instructions, wherein the computer program instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: extracting respective themes of a set of documents with release time within a first period; determining respective semantic information of the themes and frequencies of the themes appearing in the set of documents utilizing at least a first machine learning model, the first machine learning model comprising a transformer-based model; and determining the number of documents associated with the themes within a second period according to a second machine learning model, different than the first machine learning model, the second machine learning model comprising a time sequence prediction model and based on the semantic information and frequencies of the themes, wherein the second period is after the first period; wherein the first machine learning model is configured to encode the semantic information of the themes and the frequencies of the themes appearing in the set of documents; wherein results of the encoding of the semantic information of the themes and the frequencies of the themes comprises, for each of at least a subset of the themes, (i) a semantic encoding component comprising at least a classification token for the semantic information of the theme, and (ii) a frequency encoding component comprising a plurality of frequency-related values generated for the frequency of appearance of the theme based on a designated probability function; wherein the second machine learning model is configured to process the semantic encoding components and the frequency encoding components to determine the number of documents associated with the themes within the second period; wherein determining the frequencies comprises determining a time sequence of frequency representations of the themes within the first period; wherein determining the time sequence of frequency representations comprises: for a time interval point within the first period, determining a frequency representation of the time sequence of frequency representations at the time interval point based on the number of documents corresponding to the themes; and wherein determining the frequency representation at the time interval point comprises: determining the frequency representation by using a position extending code based on the number of documents corresponding to the themes. 9. The electronic device according to claim 8 , wherein determining the semantic information comprises: determining a time sequence of semantic representations of the themes within the first period. 10. The electronic device according to claim 9 , wherein determining the time sequence of semantic representations comprises: for a time interval point within the first period, determining a semantic representation of the time sequence of semantic representations at the time interval point according to a semantic encoding model and based on words or words in phrases corresponding to the themes in documents with release time not later than the time interval point in the set of documents. 11. The electronic device according to claim 8 , wherein determining the frequency representation of the time sequence of frequency representations at the time interval point is based on the number of documents corresponding to the themes in documents with release time not later than the time interval point in the set of documents. 12. The electronic device according to claim 8 , where

Assignees

Inventors

Classifications

  • Knowledge representation; Symbolic representation · CPC title

  • Combinations of networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12182516B2 cover?
Embodiments of the present disclosure relate to a computer-implemented method, a device, and a computer program product. The method includes extracting respective themes of a set of documents with release time within a first period; determining respective semantic information of the themes and frequencies of the themes appearing in the set of documents; and determining the number of documents a…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).