Occupant monitoring systems and methods
US-2016185354-A1 · Jun 30, 2016 · US
US10748542B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10748542-B2 |
| Application number | US-201815934721-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 23, 2018 |
| Priority date | Mar 23, 2017 |
| Publication date | Aug 18, 2020 |
| Grant date | Aug 18, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for automated speech recognition utilizes computer memory, a processor executing imaging software and audio processing software, and a camera transmitting images of a physical source of speech input. Audio processing software includes an audio data stream of audio samples derived from at least one speech input. At least one timer is configured to transmit elapsed time values as measured in response to respective triggers received by the timer. The audio processing software is configured to assert and de-assert the timer triggers to measure respective audio sample times and interim period times between the audio samples. The audio processing software is further configured to compare the interim period times with a command spacing time value corresponding to an expected interim time value between commands, thereby determining if the speech input is command data or non-command data.
Opening claim text (preview).
The invention claimed is: 1. A system for monitoring an area within a vehicle comprising: computer memory; a processor executing imaging software and audio processing software; an imaging device transmitting to said imaging software a plurality of frames of pixel data from an image acquired from a field of view within the vehicle and associated with the imaging device; a speech input device transmitting to said audio processing software an audio data stream of audio samples derived from at least one speech input; wherein the processor is configured to identify a source of the audio data stream from the frames of pixel data and the audio samples; wherein said imaging software isolates, from said frames a subset of digital pixel data representing a physical source of the audio data stream; wherein said memory comprises a translator dictionary matching a sequence of mouth movements represented by subsets of said digital pixel data to spoken utterances represented by portions of said audio samples; wherein said processor calculates a virtual position of the mouth relative to a set position of said speech input device in the vehicle to determine a direction of the audio data stream. 2. A system according to claim 1 , further comprising an amplitude threshold value stored in said computer memory, wherein said audio processing software is further configured to compare an amplitude of respective audio samples with said amplitude threshold value to distinguish a valid audio sample, an invalid audio sample, and an interim period between audio samples. 3. A system according to claim 2 , further comprising command processing software configured to (i) track said valid audio samples in a time domain, (ii) discard invalid audio samples and (iii) track said interim periods in said time domain. 4. A system according to claim 1 , wherein said imaging software is configured to compare consecutive frames of pixel data and determine image differences between said consecutive frames. 5. A system according to claim 4 , wherein said speech input originates from a user's mouth and said image differences comprise pixel differences in said frames that determine a user's mouth movement and/or non-movement. 6. A system according to claim 5 , wherein said processor accesses command processing software stored in the computer memory and calculates a physical position of the mouth relative to the field of view of the imaging device from a virtual position of the mouth represented by a subset of the pixel data. 7. A system according to claim 6 , wherein said processor accesses command processing software stored in the computer memory and determines a plurality of virtual positions of the mouth during either a valid audio sample or during an interim period. 8. A system according to claim 7 , wherein said virtual positions of the mouth validate that the mouth is moving during said audio data stream of audio samples and/or not moving during an interim period. 9. A system according to claim 7 , wherein said command processing software identifies a plurality of frames of pixel data representing respective virtual positions of the mouth that are grouped with at least one valid audio sample and compares said image differences among the plurality of frames to decipher a command from the user's mouth movement. 10. A system for monitoring an area within a vehicle comprising: computer memory; a processor executing imaging software and audio processing software; an imaging device transmitting to said imaging software a plurality of frames of pixel data from an image acquired from a field of view within the vehicle and associated with the imaging device; a speech input device transmitting to said audio processing software an audio data stream of audio samples derived from at least one speech input; wherein the processor is configured to identify a source of the audio data stream from the frames of pixel data and the audio samples; wherein said imaging software isolates, from said frames a subset of digital pixel data representing a physical source of the audio data stream; wherein said memory comprises a translator dictionary matching a sequence of mouth movements represented by the subsets of said digital pixel data to spoken utterances represented by portions of said audio samples; wherein said processor calculates a virtual position of the mouth relative to a set position of said speech input device in the vehicle to determine a direction of the audio data stream; and wherein the processor utilizes the direction of the audio data stream, the frames of the pixel data, and the spoken utterances to identify a source of the speech input. 11. A system according to claim 10 , further comprising a database of voice tokens stored as portions of the audio samples to evaluate a speech input as a command also stored in the database for an identified user. 12. A system for confirming audible commands issued to an audio-visual monitoring system of a vehicle comprising: computer memory; a processor executing imaging software, audio processing software, and command processing software; an imaging device transmitting to said imaging software a plurality of sequential frames of digital pixel data from an image acquired within a field of view associated with the imaging device; a speech input device transmitting to said audio processing software an audio data stream of audio samples derived from at least one speech input; wherein said imaging software isolates, from said frames of digital pixel data, a subset of pixels representing a physical source of the speech input; and wherein said command processing software correlates, on a time basis, each audio sample to respective subsets of pixels representing the physical source in respective groups of sequential frames of image data; wherein said imaging software is configured to track multiple positions of said physical source of speech input by deriving respective positions of said physical source from the respective subsets of pixels; wherein said command processing software validates an audio sample as a command according to the respective positions of said physical source of speech input relative to said speech input device. 13. A system according to claim 12 , wherein said command processing software stores in said memory a voice token profile comprising resulting correlations between said subsets of pixels and said audio samples, and wherein said imaging software accesses said voice token profile for setting image acquisition parameters in the camera. 14. A system according to claim 13 , wherein said physical source represented by said subset of pixels is a human mouth. 15. A system according to claim 14 , wherein said processor calculates a virtual position of the mouth relative to a set virtual position of said speech input device to determine a direction of input sound. 16. A system according to claim 15 wherein said processor consolidates said correlations into a translator dictionary matching a sequence of mouth movements represented by said respective subsets of pixels to spoken words represented by said portions of said set of audio samples.
Vehicle interior · CPC title
characterised by the type of extracted parameters · CPC title
involving subtraction of images · CPC title
Word spotting · CPC title
specially adapted for particular use · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.