Tool for assisting people with speech disorder
US-11763821-B1 · Sep 19, 2023 · US
US12190859B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12190859-B2 |
| Application number | US-202017792012-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 10, 2020 |
| Priority date | Feb 10, 2020 |
| Publication date | Jan 7, 2025 |
| Grant date | Jan 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, using a speech recognition model and/or can include a selection of a rendered suggestion that conveys the textual segment(s). Some implementations dynamically determine one or more prosodic properties for use in speech synthesis of the textual segment, and generate the synthesized speech with the one or more determined prosodic properties. The prosodic properties can be determined based on the textual segment(s) used in speech synthesis, textual segment(s) corresponding to recent spoken input of additional participant(s), attribute(s) of relationship(s) between the given user and additional participant(s) in the conversation, and/or feature(s) of a current location for the conversation.
Opening claim text (preview).
What is claimed is: 1. A method implemented by one or more processors, the method comprising: detecting, via one or more microphones of a client device of a given user, spoken input of the given user; determining, based on processing the spoken input of the given user, a textual segment for conveying in a conversation in which the given user is a participant; identifying an additional participant in the conversation, the additional participant being in addition to the given user, and the additional participant being physically located in an environment with the given user; determining at least one attribute of a relationship between the given user and the additional participant; determining, based on the at least one attribute of the relationship between the given user and the additional participant, a given set of one or more prosodic properties, wherein the given set of the one or more prosodic properties is a first set of the one or more prosodic properties in response to determining the at least one attribute of the relationship between the given user and the additional participant is a first attribute, and wherein the given set of the one or more prosodic properties is a second set of the one or more prosodic properties in response to determining the at least one attribute of the relationship between the given user and the additional participant is a second attribute; generating synthesized speech audio data that includes synthesized speech that incorporates the textual segment and that is synthesized with the given set of the one or more prosodic properties, wherein generating the synthesized speech audio data comprises synthesizing the synthesized speech with the given set of the one or more prosodic properties responsive to determining the given set of the one or more prosodic properties based on the attribute of the relationship between the given user and the additional participant; and causing the synthesized speech to be rendered via one or more speakers of the client device and/or an additional client device, wherein the rendered synthesized speech is audibly perceptible to the additional participant. 2. The method of claim 1 , wherein determining, based on processing the spoken input of the given user, the textual segment, comprises: processing the spoken input using a speech recognition model to generate the textual segment. 3. The method of claim 2 , wherein the speech recognition model is an on-device speech recognition model and/or is trained for recognizing speech of speech impaired users. 4. The method of claim 1 , further comprising, subsequent to causing the synthesized speech to be rendered: detecting, via one or more of the microphones of the client device, an additional participant spoken input, of the additional participant; processing the additional participant spoken input using a speech recognition model to generate an additional participant textual segment that is a recognition of the additional participant spoken input; determining that an additional textual segment is a candidate response to the additional participant textual segment; and determining to display a graphical element that conveys the additional textual segment responsive to determining that the additional textual segment is the candidate response to the additional participant textual segment. 5. The method of claim 4 , wherein identifying the additional participant in the conversation comprises: performing speaker identification using the additional participant spoken input; and identifying the additional participant based on the speaker identification. 6. The method of claim 5 , wherein performing the speaker identification comprises: generating, at the client device, a spoken input embedding based on processing the additional participant spoken input using a speaker identification model; and comparing, at the client device, the spoken input embedding to a pre-stored embedding for the additional participant, the pre-stored embedding being previously stored locally at the client device responsive to authorization by the additional participant. 7. The method of claim 4 , wherein determining that the additional textual segment is the candidate response to the additional participant textual segment is further based on the at least one attribute of the relationship between the given user and the additional participant. 8. The method of claim 7 , wherein determining that the additional textual segment is the candidate response to the additional participant textual segment is further based on the at least one attribute of the relationship between the given user and the additional participant comprises: generating a superset of initial candidate responses based on the additional participant textual segment, the superset including the additional textual segment; and selecting, from the superset of initial candidate responses, the additional textual segment as the candidate response based on the at least one of the attributes of the relationship between the given user and the additional participant. 9. The method of claim 4 , further comprising: determining at least one classification of a location of the client device; wherein determining that the additional textual segment is the candidate response to the additional participant textual segment is further based on the at least one classification of the location. 10. The method of claim 4 , further comprising: in response to receiving a user selection, from the given user, of the graphical element that conveys the additional textual segment: generating additional synthesized speech audio data that includes additional synthesized speech that incorporates the additional textual segment and that is synthesized with the one or more prosodic properties, wherein generating the additional synthesized speech audio data comprises synthesizing the additional synthesized speech with the one or more prosodic properties; and causing the additional synthesized speech to be rendered via one or more of the speakers of the client device and/or the additional client device, wherein the rendered additional synthesized speech is audibly perceptible to the additional participant. 11. The method of claim 1 , further comprising: identifying a further additional participant in the conversation, the further additional participant being in addition to the given user and being in addition to the additional participant; determining the given set of the one or more prosodic properties based on both: (a) the attribute of the relationship between the given user and the additional participant, and (b) one or more additional attributes of an additional relationship between the given user and the further additional participant. 12. The method of claim 1 , further comprising: identifying a further additional participant in the conversation, the further additional participant being in addition to the given user and being in addition to the additional participant; determining the given set of the one or more prosodic properties based on the attribute of the relationship between the given user and the additional participant, in lieu of one or more additional attributes of an additional relationship between the given user and the further additional participant, responsive to: determining that the relationship between the given user and the additional participant is more formal than the additional relationship between the given user and the further additional participant. 13. The method of claim 1 , further comprising: determining at least one classification of a location of the client device; wherein determining the
Prosody rules derived from text; Stress or intonation · CPC title
Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Detection of discrete points within a voice signal · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.