Voice-controlled camera operations
US-9031847-B2 · May 12, 2015 · US
US10360910B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10360910-B2 |
| Application number | US-201715687228-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 25, 2017 |
| Priority date | Aug 29, 2016 |
| Publication date | Jul 23, 2019 |
| Grant date | Jul 23, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An automatic speech recognition (ASR) system is disclosed that compensates for different noise environments and types of speech. The ASR system may be implemented as part of an action camera that collects status data, such as geographic location data and/or sensor data. The ASR system may perform speech recognition using an acoustic model and a speech recognition model, which are trained for operation in specific noise environments and/or for specific types of speech. The computing device may categorize a current status of the action camera, as indicated by the status data, into an action profile, which may represent a particular activity (e.g., running, cycling, etc.) or state of the computing device. The computing device may dynamically switch the acoustic model and/or the speech recognition model to compensate for anticipated changes in the noise environment and speech based upon the action profile to facilitate the recognition of various action camera functions.
Opening claim text (preview).
What is claimed is: 1. A computing device, comprising: a location-determining component configured to receive location signals and to generate geographic location data based on the received location signals; a sensor array configured to generate sensor data indicative of movement of the computing device; a memory configured to store a plurality of acoustic models and a plurality of speech recognition models to facilitate speech recognition, each acoustic model from among the plurality of acoustic models being associated with one or more acoustic tuning parameters corresponding to an environment with a unique noise characteristic, and each speech recognition model from among the plurality of speech recognition models being associated with a phonetic match tolerance; a processor unit coupled with the location-determining component, the sensor array, and the memory, the processor unit configured to: receive audible speech including a plurality of words; identify an action profile based on one or more of the geographic location data and the sensor data; select an acoustic model and a speech recognition model from among the plurality of acoustic models and speech recognition models based on the identified action profile; determine a phonetic term associated with each word in the received speech based on the selected acoustic model's acoustic tuning parameters to recognize speech; determine a meaning for each determined phonetic term by searching the selected speech recognition model for a match to the determined phonetic term, and execute a computing device function based on the determined meaning for each word within the received audible recognized speech. 2. The computing device of claim 1 , wherein the processor unit is further configured to select a speech recognition model from among the plurality of speech models having a higher phonetic match tolerance when the action profile indicates movement of the computing device in excess of a predetermined movement threshold, the higher phonetic match tolerance resulting in a higher depth and breadth of search for a match to the determined phonetic term. 3. The computing device of claim 1 , wherein the action profile indicates an instantaneous velocity of the computing device, and wherein the processor unit is further configured to select an acoustic model and a speech recognition model based on the instantaneous velocity of the computing device. 4. The computing device of claim 3 , wherein: each acoustic model from among the plurality of acoustic models is associated with a predetermined range of computing device velocities, each speech recognition model from among the plurality of speech recognition models is associated with a predetermined range of computing device velocities, and the processor unit is further configured to select an acoustic model and a speech recognition model having a respective predetermined range of velocities associated with the instantaneous velocity of the computing device. 5. The computing device of claim 1 , wherein the action profile indicates an orientation of the computing device, and wherein the processor unit is further configured to select an acoustic model and a speech recognition model based on the orientation of the computing device. 6. The computing device of claim 1 , wherein the one or more acoustic tuning parameters associated with each acoustic model from among the plurality of acoustic models facilitates the determination of phonetic terms in accordance with a different level of noise tolerance. 7. The computing device of claim 1 , wherein the acoustic model is trained in accordance with a type of speech resulting from a user performing a type of physical activity matching the identified action profile. 8. The computing device of claim 1 , wherein the plurality of acoustic models and the plurality of speech recognition models facilitate speech recognition in accordance with a trigger speech recognizer that facilitates speech recognition of a wake word, and a command speech recognizer that facilitates speech recognition of computing device commands once the wake word is recognized, and wherein the processor unit is further configured to independently select an acoustic model and a speech recognition model for each of the trigger speech recognizer and the command speech recognizer. 9. The computing device of claim 1 , further comprising: wherein the processor unit is further configured to control a microphone to receive the audible speech, and to maintain the microphone in an operating state such that audio input is continuously received via the microphone. 10. An action camera, comprising: a location-determining component configured to receive location signals and to generate geographic location data based on the received location signals; a sensor array configured to generate sensor data indicative of movement of the action camera; a memory configured to store a plurality of speech recognition models and a plurality of acoustic models to facilitate speech recognition, wherein each acoustic model from among the plurality of acoustic models is associated with one or more acoustic tuning parameters corresponding to an environment with a unique noise characteristic for a predetermined range of action camera velocities, and wherein each speech recognition model from among the plurality of speech recognition models is associated with a phonetic match tolerance for a predetermined range of action camera velocities, and a processor unit coupled with the location-determining component, the sensor array, and the memory, the processor unit configured to: receive audible speech including a plurality of words; calculate an instantaneous velocity of the action camera based on one or more of the geographic location data and the sensor data; select an acoustic model and a speech recognition model from among the plurality of acoustic models and speech recognition models having a respective predetermined range of action camera velocities that encompass the instantaneous velocity of the action camera; determine a phonetic term associated with each word in the received speech based on the selected acoustic model's acoustic tuning parameters to recognize speech; determine a meaning for each determined phonetic term by searching the selected speech recognition model for a match to the determined phonetic term, and execute a computing device function based on the determined meaning for each word within the received audible recognized speech. 11. The action camera of claim 10 , wherein the sensor array includes one or more of an accelerometer, a gyroscope, a magnetometer, and a barometer. 12. The action camera of claim 10 , wherein the action profile indicates an orientation of the computing device, and wherein the processor unit is further configured to select an acoustic model and a speech recognition model based on the orientation of the computing device. 13. The action camera of claim 10 , wherein the processor unit is further configured to select a speech recognition model from among the plurality of speech models having a higher phonetic match tolerance when the action profile indicates movement of the action camera in excess of a predetermined movement threshold, the higher phonetic match tolerance resulting in a higher depth and breadth of search for a match to the determined phonetic term. 14. The action camera of claim 10 , wherein the acoustic model is trained in accordance with a type of speech resulting from a user performing a type of physical activity matching the instantaneous velocity of the action camera.
Control of parameters via user interfaces · CPC title
Cross-Sectional Technologies · mapped topic
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Cross-Sectional Technologies · mapped topic
with voice recognition means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.