Who is the assignee on this patent?

Joyson Safety Systems Acquisition Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/25. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method of correlating mouth images to input commands

US10748542B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10748542-B2
Application number	US-201815934721-A
Country	US
Kind code	B2
Filing date	Mar 23, 2018
Priority date	Mar 23, 2017
Publication date	Aug 18, 2020
Grant date	Aug 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for automated speech recognition utilizes computer memory, a processor executing imaging software and audio processing software, and a camera transmitting images of a physical source of speech input. Audio processing software includes an audio data stream of audio samples derived from at least one speech input. At least one timer is configured to transmit elapsed time values as measured in response to respective triggers received by the timer. The audio processing software is configured to assert and de-assert the timer triggers to measure respective audio sample times and interim period times between the audio samples. The audio processing software is further configured to compare the interim period times with a command spacing time value corresponding to an expected interim time value between commands, thereby determining if the speech input is command data or non-command data.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for monitoring an area within a vehicle comprising: computer memory; a processor executing imaging software and audio processing software; an imaging device transmitting to said imaging software a plurality of frames of pixel data from an image acquired from a field of view within the vehicle and associated with the imaging device; a speech input device transmitting to said audio processing software an audio data stream of audio samples derived from at least one speech input; wherein the processor is configured to identify a source of the audio data stream from the frames of pixel data and the audio samples; wherein said imaging software isolates, from said frames a subset of digital pixel data representing a physical source of the audio data stream; wherein said memory comprises a translator dictionary matching a sequence of mouth movements represented by subsets of said digital pixel data to spoken utterances represented by portions of said audio samples; wherein said processor calculates a virtual position of the mouth relative to a set position of said speech input device in the vehicle to determine a direction of the audio data stream. 2. A system according to claim 1 , further comprising an amplitude threshold value stored in said computer memory, wherein said audio processing software is further configured to compare an amplitude of respective audio samples with said amplitude threshold value to distinguish a valid audio sample, an invalid audio sample, and an interim period between audio samples. 3. A system according to claim 2 , further comprising command processing software configured to (i) track said valid audio samples in a time domain, (ii) discard invalid audio samples and (iii) track said interim periods in said time domain. 4. A system according to claim 1 , wherein said imaging software is configured to compare consecutive frames of pixel data and determine image differences between said consecutive frames. 5. A system according to claim 4 , wherein said speech input originates from a user's mouth and said image differences comprise pixel differences in said frames that determine a user's mouth movement and/or non-movement. 6. A system according to claim 5 , wherein said processor accesses command processing software stored in the computer memory and calculates a physical position of the mouth relative to the field of view of the imaging device from a virtual position of the mouth represented by a subset of the pixel data. 7. A system according to claim 6 , wherein said processor accesses command processing software stored in the computer memory and determines a plurality of virtual positions of the mouth during either a valid audio sample or during an interim period. 8. A system according to claim 7 , wherein said virtual positions of the mouth validate that the mouth is moving during said audio data stream of audio samples and/or not moving during an interim period. 9. A system according to claim 7 , wherein said command processing software identifies a plurality of frames of pixel data representing respective virtual positions of the mouth that are grouped with at least one valid audio sample and compares said image differences among the plurality of frames to decipher a command from the user's mouth movement. 10. A system for monitoring an area within a vehicle comprising: computer memory; a processor executing imaging software and audio processing software; an imaging device transmitting to said imaging software a plurality of frames of pixel data from an image acquired from a field of view within the vehicle and associated with the imaging device; a speech input device transmitting to said audio processing software an audio data stream of audio samples derived from at least one speech input; wherein the processor is configured to identify a source of the audio data stream from the frames of pixel data and the audio samples; wherein said imaging software isolates, from said frames a subset of digital pixel data representing a physical source of the audio data stream; wherein said memory comprises a translator dictionary matching a sequence of mouth movements represented by the subsets of said digital pixel data to spoken utterances represented by portions of said audio samples; wherein said processor calculates a virtual position of the mouth relative to a set position of said speech input device in the vehicle to determine a direction of the audio data stream; and wherein the processor utilizes the direction of the audio data stream, the frames of the pixel data, and the spoken utterances to identify a source of the speech input. 11. A system according to claim 10 , further comprising a database of voice tokens stored as portions of the audio samples to evaluate a speech input as a command also stored in the database for an identified user. 12. A system for confirming audible commands issued to an audio-visual monitoring system of a vehicle comprising: computer memory; a processor executing imaging software, audio processing software, and command processing software; an imaging device transmitting to said imaging software a plurality of sequential frames of digital pixel data from an image acquired within a field of view associated with the imaging device; a speech input device transmitting to said audio processing software an audio data stream of audio samples derived from at least one speech input; wherein said imaging software isolates, from said frames of digital pixel data, a subset of pixels representing a physical source of the speech input; and wherein said command processing software correlates, on a time basis, each audio sample to respective subsets of pixels representing the physical source in respective groups of sequential frames of image data; wherein said imaging software is configured to track multiple positions of said physical source of speech input by deriving respective positions of said physical source from the respective subsets of pixels; wherein said command processing software validates an audio sample as a command according to the respective positions of said physical source of speech input relative to said speech input device. 13. A system according to claim 12 , wherein said command processing software stores in said memory a voice token profile comprising resulting correlations between said subsets of pixels and said audio samples, and wherein said imaging software accesses said voice token profile for setting image acquisition parameters in the camera. 14. A system according to claim 13 , wherein said physical source represented by said subset of pixels is a human mouth. 15. A system according to claim 14 , wherein said processor calculates a virtual position of the mouth relative to a set virtual position of said speech input device to determine a direction of input sound. 16. A system according to claim 15 wherein said processor consolidates said correlations into a translator dictionary matching a sequence of mouth movements represented by said respective subsets of pixels to spoken words represented by said portions of said set of audio samples.

Assignees

Joyson Safety Systems Acquisition Llc

Inventors

Classifications

G06T2207/30268
Vehicle interior · CPC title
G10L25/03
characterised by the type of extracted parameters · CPC title
G06T7/254
involving subtraction of images · CPC title
G10L2015/088
Word spotting · CPC title
G10L25/48
specially adapted for particular use · CPC title

Patent family

Related publications grouped by family.

View patent family 63585792

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10748542B2 cover?: A system for automated speech recognition utilizes computer memory, a processor executing imaging software and audio processing software, and a camera transmitting images of a physical source of speech input. Audio processing software includes an audio data stream of audio samples derived from at least one speech input. At least one timer is configured to transmit elapsed time values as measure…
Who is the assignee on this patent?: Joyson Safety Systems Acquisition Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/25. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).