Who is the assignee on this patent?

Nelson Steven, Kitada Hiroshi, Wong Lana, and 1 more

What technology area does this patent fall under?

Primary CPC classification G10L15/1815. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Speech-to-text conversion for interactive whiteboard appliances using multiple services

US10553208B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10553208-B2
Application number	US-201715728367-A
Country	US
Kind code	B2
Filing date	Oct 9, 2017
Priority date	Oct 9, 2017
Publication date	Feb 4, 2020
Grant date	Feb 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Artificial intelligence is introduced into an electronic meeting context to perform various tasks before, during, and/or after electronic meetings. The artificial intelligence may analyze a wide variety of data such as data pertaining to other electronic meetings, data pertaining to organizations and users, and other general information pertaining to any topic. Capability is also provided to create, manage, and enforce meeting rules templates that specify requirements and constraints for various aspects of electronic meetings. Embodiments include improved approaches for translation and transcription using multiple translation/transcription services. Embodiments also include using sensors in conjunction with interactive whiteboard appliances to perform person detection, person identification, attendance tracking, and improved meeting start. Embodiments further include improvements to the presentation of content on interactive whiteboard appliances, providing meeting services for meeting attendees, agenda extraction, and learning to aid in creating new electronic meetings.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: one or more processors; and one or more memories storing instructions which, when processed by the one or more processors, cause: retrieving audio data that represents human speech or text, selecting, from a plurality of translation/transcription services, two or more selected translation/transcription services to process the audio data that represents human speech or text, providing the audio data that represents human speech or text to the two or more selected translation/transcription services, receiving, from each translation/transcription service from the two or more selected translation/transcription services, translation/transcription data that includes a plurality of units of speech and a plurality of confidence scores for the plurality of units of speech, generating, based upon the plurality of confidence scores for the plurality of units of speech from each translation/transcription service, from the two or more selected translation/transcription services, and one or more selection criteria that include confidence scores, resulting translation/transcription data that includes a plurality of units of speech selected from the translation/transcription data received from the two or more selected translation/transcription services, wherein the resulting translation/transcription data includes units of speech from at least two different translation/transcription services from the two or more selected translation/transcription services, and providing the resulting translation/transcription data to a requestor device. 2. The apparatus of claim 1 , wherein the selecting, from the plurality of translation/transcription services, two or more selected translation/transcription services to process the audio data that represents human speech or text, is performed based upon one or more factors that include one or more of language, context, speaker, location, or compliance requirement. 3. The apparatus of claim 1 , wherein the units of speech from the at least two different translation/transcription services from two or more selected translation/transcription services are at least partially interleaved. 4. The apparatus of claim 1 , wherein the one or more memories store additional instructions which, when processed by the one or more processors, cause: determining that the confidence score for a particular corresponding unit of speech, from each translation/transcription service from the two or more selected translation/transcription services, is below a specified threshold, and in response to determining, that the confidence score for a particular unit of speech, from each translation/transcription service from the two or more selected translation/transcription services, is below a specified threshold, selecting the particular unit of speech from the translation/transcription data for a particular translation/transcription service from the two or more selected translation/transcription services based upon configuration data that specifies one or more factors that include one or more of language, context, speaker, location, or compliance requirement. 5. The apparatus of claim 1 , wherein the one or more selection criteria include one or more of language, context, geographical location, industry, organization information, speaker identification or classes of speaker. 6. The apparatus of claim 1 , wherein the one or more memories store additional instructions which, when processed by the one or more processors, cause: identifying, from the plurality of units of speech included in the resulting translation/transcription data, a particular unit of speech with a confidence score that does not satisfy a specified threshold, and in response to identifying, from the plurality of units of speech included in the resulting translation/transcription data, the particular unit of speech with a confidence score that does not satisfy a specified threshold, designating the particular unit of speech for manual processing to improve accuracy. 7. The apparatus of claim 1 , wherein the one or more memories store additional instructions which, when processed by the one or more processors, cause: identifying, from the plurality of units of speech included in the resulting translation/transcription data, a particular unit of speech with a confidence score that does not satisfy a specified threshold, and in response to identifying, from the plurality of units of speech included in the resulting translation/transcription data, the particular unit of speech with a confidence score that does not satisfy a specified threshold, modifying the particular unit of speech based upon one or more of organization-specific information or industry-specific information. 8. The apparatus of claim 1 , wherein the one or more memories store additional instructions which, when processed by the one or more processors, cause: identifying, from the plurality of units of speech included in the resulting translation/transcription data, a particular unit of speech with a confidence score that does not satisfy a specified threshold, and in response to identifying, from the plurality of units of speech included in the resulting translation/transcription data, the particular unit of speech with a confidence score that does not satisfy a specified threshold, performing auto-correction on the particular unit of speech by: determining a known unit of speech that satisfies a similarity threshold with respect to the particular unit of speech, and replacing the particular unit of speech with the known unit of speech that satisfies the similarity threshold with respect to the particular unit of speech. 9. One or more non-transitory computer-readable media storing instructions which, when processed by one or more processors, cause: retrieving audio data that represents human speech or text, selecting, from a plurality of translation/transcription services, two or more selected translation/transcription services to process the audio data that represents human speech or text, providing the audio data that represents human speech or text to the two or more selected translation/transcription services, receiving, from each translation/transcription service from the two or more selected translation/transcription services, translation/transcription data that includes a plurality of units of speech and a plurality of confidence scores for the plurality of units of speech, generating, based upon the plurality of confidence scores for the plurality of units of speech from each translation/transcription service, from the two or more selected translation/transcription services, and one or more selection criteria that include confidence scores, resulting translation/transcription data that includes a plurality of units of speech selected from the translation/transcription data received from the two or more selected translation/transcription services, wherein the resulting translation/transcription data includes units of speech from at least two different translation/transcription services from the two or more selected translation/transcription services, and providing the resulting translation/transcription data to a requestor device. 10. The one or more non-transitory computer-readable media of claim 9 , wherein the selecting, from the plurality of translation/transcription services, two or more selected translation/transcription services to process the audio data that represents human speech or text, is performed based upon one or more factors that include one or more of language, context, speaker, location, or compliance requirement. 11. The one or more non-transitory computer-readable media of claim 9

Assignees

Inventors

Classifications

G06F40/58
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G06Q10/10
Office automation; Time management · CPC title
G10L2015/227
of the speaker; Human-factor methodology · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G06N5/027
Frames · CPC title

Patent family

Related publications grouped by family.

View patent family 63363888

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10553208B2 cover?: Artificial intelligence is introduced into an electronic meeting context to perform various tasks before, during, and/or after electronic meetings. The artificial intelligence may analyze a wide variety of data such as data pertaining to other electronic meetings, data pertaining to organizations and users, and other general information pertaining to any topic. Capability is also provided to cr…
Who is the assignee on this patent?: Nelson Steven, Kitada Hiroshi, Wong Lana, and 1 more
What technology area does this patent fall under?: Primary CPC classification G10L15/1815. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).