Vocal action automation for controlling confidential content
US-2023208663-A1 · Jun 29, 2023 · US
US12411655B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12411655-B2 |
| Application number | US-202418403083-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 3, 2024 |
| Priority date | Sep 8, 2022 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example server for constructing a virtual space includes a memory configured to store computer-executable instructions and a processor configured to execute the instructions by accessing the memory. The instructions, when executed, cause the processor to extract first partial voice data corresponding to a target utterance from voice data of a first user received from a terminal of the first user among users in the virtual space; instruct a target terminal of the target user to reproduce the first partial voice data; and, based on transmission of second partial voice data of a second user to the target user being requested while the target terminal reproduces the first partial voice data, instruct the target terminal to display visual information generated based on the second partial voice data.
Opening claim text (preview).
What is claimed is: 1. A server for constructing a virtual space, the server comprising: memory configured to store computer-executable instructions; and at least one processor, wherein the instructions, when executed, configure the at least one processor individually or collectively to control the server to at least: extract first partial voice data corresponding to a target utterance from voice data of a first user received from a terminal of the first user among users in the virtual space; determine a target user to receive the first partial voice data of the first user; instruct a target terminal of the target user to reproduce the first partial voice data; based on transmission to the target user of second partial voice data corresponding to an utterance from voice data of a second user being requested while the target terminal reproduces the first partial voice data, select partial voice data to instruct the target terminal to reproduce from among the first partial voice data and the second partial voice data; instruct the target terminal to reproduce the selected partial voice data; and instruct the target terminal to display visual information generated based on the non-selected one from among the first partial voice data and the second partial voice data. 2. A method performed by a server for constructing a virtual space, the method comprising: extracting first partial voice data corresponding to a target utterance from voice data of a first user received from a terminal of the first user among users in the virtual space; determining a target user to receive the first partial voice data of the first user; instructing a target terminal of the target user to reproduce the first partial voice data; and based on transmission to the target user of second partial voice data corresponding to an utterance from voice data of a second user being requested while the target terminal reproduces the first partial voice data, instructing the target terminal to reproduce the first partial voice data, but not the second partial voice data, and to display visual information including text converted from the utterance of the second user. 3. The method of claim 2 , wherein the extracting of the first partial voice data comprises: detecting a start event and an end event from the voice data of the first user based on at least one of a gesture input of the first user or a portion of the voice data of the first user; and extracting from the voice data of the first user, as the first partial voice data, a portion corresponding to a time period between the start event and the end event. 4. The method of claim 2 , further comprising: based on receiving the voice data of the first user from the terminal of the first user, starting transmission of the voice data of the first user to the users in the virtual space; based on detecting a start event from the voice data of the first user, stopping the transmission of the voice data of the first user to the users in the virtual space; and based on detecting an end event from the voice data of the first user, restarting the transmission of the voice data of the first user to the users in the virtual space. 5. The method of claim 2 , wherein the instructing of the target terminal of the target user to reproduce the first partial voice data comprises: restricting transmission of the first partial voice data to a user among the users in the virtual space other than the determined target user. 6. The method of claim 2 , wherein the instructing of the target terminal to display the visual information generated based on the second partial voice data comprises: instructing the target terminal to restrict reproduction of the second partial voice data. 7. The method of claim 2 , further comprising: selecting partial voice data to instruct the target terminal to reproduce from among the first partial voice data and the second partial voice data; instructing the target terminal to reproduce the selected partial voice data; and instructing the target terminal to display visual information generated based on partial voice data other than the selected partial voice data among the first partial voice data and the second partial voice data. 8. The method of claim 2 , further comprising: determining an artificial intelligence (AI) server other than the server as a receiver of the first partial voice data based on at least one of a gesture input of the first user or the first partial voice data; based on the determining of the AI server as the receiver of the first partial voice data, transmitting the first partial voice data to the AI server; and restricting transmission of the first partial voice data to a user other than the first user among the users in the virtual space. 9. The method of claim 8 , further comprising: transmitting feedback voice data received from the AI server to the first user; and restricting the transmission of the feedback voice data to a user other than the first user. 10. The method of claim 2 , wherein the determining of the target user comprises: based on not determining a user among the users in the virtual space to receive the first partial voice data, determining all users in the virtual space as target users. 11. A server for constructing a virtual space, the server comprising: memory configured to store computer-executable instructions; and at least one processor, wherein the instructions, when executed, configure the at least one processor individually or collectively to control the server to at least: extract first partial voice data corresponding to a target utterance from voice data of a first user received from a terminal of the first user among users in the virtual space; determine a target user to receive the first partial voice data of the first user; instruct a target terminal of the target user to reproduce the first partial voice data; and based on transmission to the target user of second partial voice data corresponding to an utterance from voice data of a second user being requested while the target terminal reproduces the first partial voice data, instruct the target terminal to reproduce the first partial voice data, but not the second partial voice data, and to display visual information including text converted from the utterance of the second user. 12. The server of claim 11 , wherein the instructions, when executed, configure the at least one processor to individually or collectively control the server to: detect a start event and an end event from the voice data of the first user based on at least one of a gesture input of the first user or a portion of the voice data of the first user; and extract from the voice data of the first user, as the first partial voice data, a portion corresponding to a time period between the start event and the end event. 13. The server of claim 11 , wherein the instructions, when executed, configure the at least one processor to individually or collectively control the server to: based on receiving of the voice data of the first user from the terminal of the first user, start transmission of the voice data of the first user to the users in the virtual space; based on detecting of a start event from the voice data of the first user, stop the transmission of the voice data of the first user to the users in the virtual space; and based on detecting of an end event from the voice data of the first user, restart the transmission of the voice data of the first user to the users in the virtual space. 14. The server of claim 11 , wherein the instructions, when executed, configure the at l
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
Indicating arrangements; Control arrangements, e.g. balance control · CPC title
Segmentation; Word boundary detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.