Semantic text comparison using artificial intelligence identified source document topics
US-2022374598-A1 · Nov 24, 2022 · US
US2022414123A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022414123-A1 |
| Application number | US-202117362602-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 29, 2021 |
| Priority date | Jun 29, 2021 |
| Publication date | Dec 29, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A categorization system can include a computing device that is configured to obtain a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing. The computing device can also be configured to select a categorization model from a model database. The computing device can also be configured to, for each data item of the plurality of data items, apply the categorization model to the data item to identify at least one topic associated with the corresponding data item. The computing device can also be configured to generate a categorization visualization indicating a frequency of data items corresponding to each topic. The computing device can also be configured to transmit the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database.
Opening claim text (preview).
What is claimed is: 1 . A system comprising: a computing device configured to: obtain a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter; select a categorization model from a model database based on the at least one parameter of the plurality of data items; for each data item of the plurality of data items, apply the categorization model to the data item to identify at least one topic associated with the corresponding data item; generate a categorization visualization indicating a frequency of data items corresponding to each topic; and transmit the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database. 2 . The system of claim 1 , wherein the computing device is configured to: for each data item of the plurality of data items, apply a sentiment model to the data item to identify a sentiment of the data item; and generate the categorization visualization to include the identified sentiment of each data item of the plurality of data items. 3 . The system of claim 1 , wherein the at least one parameter is a language of the data item. 4 . The system of claim 3 , wherein the categorization model corresponds to at least one language. 5 . The system of claim 1 , wherein the categorization model implements a transformer-based machine learning model to determine the at least one topic corresponding to each data item of the plurality of data items. 6 . The system of claim 1 , wherein the categorization model, for each data item of the plurality of data items: compares the data item to a set of known topics; determines a similarity based on a distance value between each of known topic of the set of known topics and the data item; and categorizes the data item as a corresponding known topic of the set of known topics in response to the data item being within a threshold distance of the corresponding known topic. 7 . The system of claim 6 , wherein the categorization model identifies an unknown data item of the plurality of data items as unknown in response to each known topic of the set of known topics being outside the threshold distance. 8 . The system of claim 7 , wherein the computing device is configured to: access, via a distributed communications network, a public database; compare the unknown data item to data of the public database; identify a topic title of the unknown data item based on the comparison to data of the public database; and categorize the unknown data item as the topic title, wherein the unknown data item and the topic title are included in the categorization visualization. 9 . A method comprising: obtaining a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter; selecting a categorization model from a model database based on the at least one parameter of the plurality of data items; for each data item of the plurality of data items, applying the categorization model to the data item to identify at least one topic associated with the corresponding data item; generating a categorization visualization indicating a frequency of data items corresponding to each topic; and transmitting the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database. 10 . The method of claim 9 , further comprising: for each data item of the plurality of data items, applying a sentiment model to the data item to identify a sentiment of the data item; and generating the categorization visualization to include the identified sentiment of each data item of the plurality of data items. 11 . The method of claim 9 , wherein the at least one parameter is a language of the data item. 12 . The method of claim 11 , wherein the categorization model corresponds to at least one language. 13 . The method of claim 9 , wherein the categorization model implements a transformer-based machine learning model to determine the at least one topic corresponding to each data item of the plurality of data items. 14 . The method of claim 9 , further comprising, for each data item of the plurality of data items: comparing, via the categorization model, the data item to a set of known topics; determining, via the categorization model, a similarity based on a distance value between each of known topic of the set of known topics and the data item; and categorizing, via the categorization model, the data item as a corresponding known topic of the set of known topics in response to the data item being within a threshold distance of the corresponding known topic. 15 . The method of claim 14 , further comprising identifying, via the categorization model, an unknown data item of the plurality of data items as unknown in response to each known topic of the set of known topics being outside the threshold distance. 16 . The method of claim 15 , further comprising: accessing, via a distributed communications network, a public database; comparing the unknown data item to data of the public database; identifying a topic title of the unknown data item based on the comparison to data of the public database; and categorizing the unknown data item as the topic title, wherein the unknown data item and the topic title are included in the categorization visualization. 17 . A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by at least one processor, cause a device to perform operations comprising: obtaining a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter; selecting a categorization model from a model database based on the at least one parameter of the plurality of data items; for each data item of the plurality of data items, applying the categorization model to the data item to identify at least one topic associated with the corresponding data item; generating a categorization visualization indicating a frequency of data items corresponding to each topic; and transmitting the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database. 18 . The non-transitory computer readable medium of claim 17 , wherein the instructions include: for each data item of the plurality of data items, applying a sentiment model to the data item to identify a sentiment of the data item; and generating the categorization visualization to include the identified sentiment of each data item of the plurality of data items. 19 . The non-transitory computer readable medium of claim 17 , wherein: the at least one parameter is a language of the data item, the categorization model corresponds to at least one language, and the categorization model implements a transformer-based machine learning model to determine the at least one topic corresponding to each data item of the plurality of data items. 20 . The non-transitory computer readable medium of claim 17 , wherein the instructions include, for each data item of the plurality of data items: comparing, via the categorizatio
Machine learning · CPC title
Clustering or classification · CPC title
Market modelling; Market analysis; Collecting market data · CPC title
Semantic analysis · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.