Systems and methods for categorization of ingested database entries to determine topic frequency

US2022414123A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022414123-A1
Application numberUS-202117362602-A
CountryUS
Kind codeA1
Filing dateJun 29, 2021
Priority dateJun 29, 2021
Publication dateDec 29, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A categorization system can include a computing device that is configured to obtain a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing. The computing device can also be configured to select a categorization model from a model database. The computing device can also be configured to, for each data item of the plurality of data items, apply the categorization model to the data item to identify at least one topic associated with the corresponding data item. The computing device can also be configured to generate a categorization visualization indicating a frequency of data items corresponding to each topic. The computing device can also be configured to transmit the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a computing device configured to: obtain a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter; select a categorization model from a model database based on the at least one parameter of the plurality of data items; for each data item of the plurality of data items, apply the categorization model to the data item to identify at least one topic associated with the corresponding data item; generate a categorization visualization indicating a frequency of data items corresponding to each topic; and transmit the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database. 2 . The system of claim 1 , wherein the computing device is configured to: for each data item of the plurality of data items, apply a sentiment model to the data item to identify a sentiment of the data item; and generate the categorization visualization to include the identified sentiment of each data item of the plurality of data items. 3 . The system of claim 1 , wherein the at least one parameter is a language of the data item. 4 . The system of claim 3 , wherein the categorization model corresponds to at least one language. 5 . The system of claim 1 , wherein the categorization model implements a transformer-based machine learning model to determine the at least one topic corresponding to each data item of the plurality of data items. 6 . The system of claim 1 , wherein the categorization model, for each data item of the plurality of data items: compares the data item to a set of known topics; determines a similarity based on a distance value between each of known topic of the set of known topics and the data item; and categorizes the data item as a corresponding known topic of the set of known topics in response to the data item being within a threshold distance of the corresponding known topic. 7 . The system of claim 6 , wherein the categorization model identifies an unknown data item of the plurality of data items as unknown in response to each known topic of the set of known topics being outside the threshold distance. 8 . The system of claim 7 , wherein the computing device is configured to: access, via a distributed communications network, a public database; compare the unknown data item to data of the public database; identify a topic title of the unknown data item based on the comparison to data of the public database; and categorize the unknown data item as the topic title, wherein the unknown data item and the topic title are included in the categorization visualization. 9 . A method comprising: obtaining a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter; selecting a categorization model from a model database based on the at least one parameter of the plurality of data items; for each data item of the plurality of data items, applying the categorization model to the data item to identify at least one topic associated with the corresponding data item; generating a categorization visualization indicating a frequency of data items corresponding to each topic; and transmitting the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database. 10 . The method of claim 9 , further comprising: for each data item of the plurality of data items, applying a sentiment model to the data item to identify a sentiment of the data item; and generating the categorization visualization to include the identified sentiment of each data item of the plurality of data items. 11 . The method of claim 9 , wherein the at least one parameter is a language of the data item. 12 . The method of claim 11 , wherein the categorization model corresponds to at least one language. 13 . The method of claim 9 , wherein the categorization model implements a transformer-based machine learning model to determine the at least one topic corresponding to each data item of the plurality of data items. 14 . The method of claim 9 , further comprising, for each data item of the plurality of data items: comparing, via the categorization model, the data item to a set of known topics; determining, via the categorization model, a similarity based on a distance value between each of known topic of the set of known topics and the data item; and categorizing, via the categorization model, the data item as a corresponding known topic of the set of known topics in response to the data item being within a threshold distance of the corresponding known topic. 15 . The method of claim 14 , further comprising identifying, via the categorization model, an unknown data item of the plurality of data items as unknown in response to each known topic of the set of known topics being outside the threshold distance. 16 . The method of claim 15 , further comprising: accessing, via a distributed communications network, a public database; comparing the unknown data item to data of the public database; identifying a topic title of the unknown data item based on the comparison to data of the public database; and categorizing the unknown data item as the topic title, wherein the unknown data item and the topic title are included in the categorization visualization. 17 . A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by at least one processor, cause a device to perform operations comprising: obtaining a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing, the plurality of data items corresponding to at least one parameter; selecting a categorization model from a model database based on the at least one parameter of the plurality of data items; for each data item of the plurality of data items, applying the categorization model to the data item to identify at least one topic associated with the corresponding data item; generating a categorization visualization indicating a frequency of data items corresponding to each topic; and transmitting the categorization visualization to at least one of: (i) a user interface of an analyst device and (ii) a categorized database. 18 . The non-transitory computer readable medium of claim 17 , wherein the instructions include: for each data item of the plurality of data items, applying a sentiment model to the data item to identify a sentiment of the data item; and generating the categorization visualization to include the identified sentiment of each data item of the plurality of data items. 19 . The non-transitory computer readable medium of claim 17 , wherein: the at least one parameter is a language of the data item, the categorization model corresponds to at least one language, and the categorization model implements a transformer-based machine learning model to determine the at least one topic corresponding to each data item of the plurality of data items. 20 . The non-transitory computer readable medium of claim 17 , wherein the instructions include, for each data item of the plurality of data items: comparing, via the categorizatio

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06F16/285Primary

    Clustering or classification · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

  • Semantic analysis · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022414123A1 cover?
A categorization system can include a computing device that is configured to obtain a plurality of data items over a threshold analysis period from an incoming data database in response to a threshold analysis interval elapsing. The computing device can also be configured to select a categorization model from a model database. The computing device can also be configured to, for each data item o…
Who is the assignee on this patent?
Walmart Apollo Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).