Organizational-based language model generation

US11676576B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11676576-B2
Application numberUS-202117400055-A
CountryUS
Kind codeB2
Filing dateAug 11, 2021
Priority dateMay 2, 2019
Publication dateJun 13, 2023
Grant dateJun 13, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for acquiring training data and building an organizational-based language model based on the training data. In organizational data is generated via one or more applications associated with an organization, the collected organizational data is aggregated and filtered into training data that is used for training an organizational-based language model for speech processing based on the training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: a memory configured to store an organizational language model trained on data that is accessible to users of an organization; and a processor coupled to the memory and configured to receive input from a user of the organization via an application hosted by a host platform in which the organization is a tenant, recognize speech from the received input based on execution of the organizational language model via the host platform, and transmit the recognized speech to the application. 2. The computing system of claim 1 , wherein the data that is accessible to the users of the organization is labeled with an identifier to distinguish it from data that is not accessible to the users of the organization. 3. The computing system of claim 1 , wherein the data that is accessible to the users of the organization comprises group data of the organization that is filtered to remove group data that is not available to the users of the organization. 4. The computing system of claim 1 , wherein the processor is further configured to retrain the organizational language model via execution of the organizational language model based on the recognized speech. 5. The computing system of claim 1 , wherein the processor is further configured to remove a subset of data that is previously used to train the organizational language model from among the data that is available to the users of the organization and that is older than a predetermined age limit, and retrain the organizational language model based on remaining data within the data that is available to the users of the organization. 6. The system of claim 5 , wherein the processor is further configured to identify the subset of data based on a time stamp that is stored with the subset of data in the memory. 7. The computing system of claim 1 , wherein the processor is further configured to remove a subset of data that is previously used to train the organizational language model from among the data that is available to the users of the organization and that has changed to data this is not available to the of the users of the organization, and retrain the organizational language model based on remaining data within the data that is available to the users of the organization. 8. The system of claim 7 , wherein the processor is further configured to identify the subset of data based on a group identifier that is stored with the subset of data in the memory. 9. A method comprising: storing an organizational language model trained on data that is accessible to users of an organization; receiving input from a user of the organization via an application hosted by a host platform in which the organization is a tenant; recognizing speech from the received input based on execution of the organizational language model via the host platform; and transmitting the recognized speech to the application. 10. The method of claim 9 , wherein the data that is accessible to the users of the organization is labeled with an identifier to distinguish it from data that is not accessible to the users of the organization. 11. The method of claim 9 , wherein the data that is accessible to the users of the organization comprises group data of the organization that is filtered to remove group data that is not available to the users of the organization. 12. The method of claim 9 , wherein the method further comprises retraining the organizational language model via execution of the organizational language model based on the recognized speech. 13. The method of claim 9 , wherein the method further comprises removing a subset of data that is previously used to train the organizational language model from among the data that is available to the users of the organization and that is older than a predetermined age limit, and retraining the organizational language model based on remaining data within the data that is available to the users of the organization. 14. The method of claim 13 , wherein the method further comprises identifying the subset of data based on a time stamp that is stored with the subset of data in the memory. 15. The method of claim 9 , wherein the method further comprises removing a subset of data that is previously used to train the organizational language model from among the data that is available to the users of the organization and that has changed to data this is not available to all of the users of the organization, and retraining the organizational language model based on remaining data within the data that is available to the users of the organization. 16. The method of claim 15 , wherein the method further comprises identifying the subset of data based on a group identifier that is stored with the subset of data in the memory. 17. A computing system comprising: a memory configured to store an organizational language model trained on data that is accessible to users of an organization; and a processor coupled to the memory and configured to determine that the data that is accessible to the users of the organization has changed, retrain the organizational language model based on the data that is accessible to the users of the organization that has changed, and store the retrained organizational language model in the memory. 18. The computing system of claim 17 , wherein the processor is configured to determine that a subset of data from among the data that is accessible to the users of the organization is older than a predetermined age limit, and remove the subset of data from the data that is accessible to the users of the organization to generate the data that is accessible to the users of the organization that has changed. 19. The computing system of claim 17 , wherein the processor is further configured to determine that a subset of data from among the data that is accessible to the users of the organization has changed to data that is not available to the users of the organization, and remove the subset of data from the data that is accessible to the users of the organization to generate the data that is accessible to the users of the organization that has changed. 20. The computing system of claim 17 , wherein the processor is further configured to receive input from a user of the organization via an application hosted by a host platform in which the organization is a tenant, and recognize speech from the received input based on execution of the retrained organizational language model via the host platform.

Assignees

Inventors

Classifications

  • Office automation; Time management · CPC title

  • G10L15/063Primary

    Training · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Learning methods · CPC title

  • using context dependencies, e.g. language models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11676576B2 cover?
Systems and methods are provided for acquiring training data and building an organizational-based language model based on the training data. In organizational data is generated via one or more applications associated with an organization, the collected organizational data is aggregated and filtered into training data that is used for training an organizational-based language model for speech pr…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 13 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).