Method and apparatus for medical data auto collection segmentation and analysis platform

US10943673B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10943673-B2
Application numberUS-201916379992-A
CountryUS
Kind codeB2
Filing dateApr 10, 2019
Priority dateApr 10, 2019
Publication dateMar 9, 2021
Grant dateMar 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of medical data auto collection segmentation and analysis, includes collecting, from a plurality of sources, unstructured medical data in a plurality of formats, recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary, and performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic. The method further includes generating, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups, and indexing the structured medical data into elastic search clusters.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of medical data auto collection segmentation and analysis, the method comprising: collecting, from a plurality of sources, unstructured medical data in a plurality of formats; recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary; performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic; generating, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups; and indexing the structured medical data into elastic search clusters, wherein the performing the semantic text segmentation comprises: training a latent Dirichlet allocation (LDA) model and a non-negative matrix factorization (NMF) model, using the unstructured medical data; and for each of sentences of the unstructured medical data: outputting LDA scores and NMF scores respectively from the LDA model and the NMF model; and performing a softmax function on each of the LDA scores and the NMF scores to respectively generate first standard derivation scores and second standard derivation scores. 2. The method of claim 1 , further comprising controlling to search for and display at least one of the elastic search clusters. 3. The method of claim 1 , further comprising generating the medical dictionary, using the unstructured medical data. 4. The method of claim 1 , wherein the performing the semantic text segmentation further comprises, for each of sentences of the unstructured medical data: summing the first standard derivation scores and the second standard derivation scores; averaging the first standard derivation scores and the second standard derivation scores that are summed, to determine an average score; and determining a topic of a respective one of the sentences, based on the average score. 5. The method of claim 1 , further comprising generating a hieratical tree structure of metadata of each piece of the unstructured medical data, wherein the indexing the structured medical data comprises indexing the structured medical data into the elastic search clusters, using the hieratical tree structure of metadata of each piece of the unstructured medical data. 6. The method of claim 1 , wherein the unstructured medical data comprises any one or any combination of medical books, diagnosis cases, forum discussions and medical papers, from the Internet. 7. An apparatus for medical data auto collection segmentation and analysis, the apparatus comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code including: collecting code configured to cause the at least one processor to collect, from a plurality of sources, unstructured medical data in a plurality of formats; recognizing code configured to cause the at least one processor to recognize a medical name entity of each piece of the unstructured medical data, using a medical dictionary; performing code configured to cause the at least one processor to perform semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic; first generating code configured to cause the at least one processor to generate, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups; and indexing code configured to cause the at least one processor to index the structured medical data into elastic search clusters) wherein the performing code is further configured to cause the at least one processor to: train a latent Dirichlet allocation (LDA) model and a non-negative matrix factorization (NMF) model, using the unstructured medical data; and for each of sentences of the unstructured medical data: output LDA scores and NMF scores respectively from the LDA model and the NMF model; and performing a softmax function on each of the LDA scores and the NMF scores to respectively generate first standard derivation scores and second standard derivation scores. 8. The apparatus of claim 7 , further comprising controlling code configured to cause the at least one processor to control to search for and display at least one of the elastic search clusters. 9. The apparatus of claim 7 , further comprising second generating code configured to cause the at least one processor to generate the medical dictionary, using the unstructured medical data. 10. The apparatus of claim 7 , wherein the performing code is further configured to cause the at least one processor to, for each of sentences of the unstructured medical data: sum the first standard derivation scores and the second standard derivation scores; average the first standard derivation scores and the second standard derivation scores that are summed, to determine an average score; and determine a topic of a respective one of the sentences, based on the average score. 11. The apparatus of claim 7 , further comprising second generating code configured to cause the at least one processor to generate a hieratical tree structure of metadata of each piece of the unstructured medical data, wherein the indexing code is further configured to cause the at least one processor to index the structured medical data into the elastic search clusters, using the hieratical tree structure of metadata of each piece of the unstructured medical data. 12. The apparatus of claim 7 , wherein the unstructured medical data comprises any one or any combination of medical books, diagnosis cases, forum discussions and medical papers, from the Internet. 13. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor of a device, cause the at least one processor to: collect, from a plurality of sources, unstructured medical data in a plurality of formats; recognize a medical name entity of each piece of the unstructured medical data, using a medical dictionary; perform semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic; generate, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups; and index the structured medical data into elastic search clusters, wherein the instructions further cause the at least one processor to: train a latent Dirichlet allocation (LDA) model and a non-negative matrix factorization (NMF) model, using the unstructured medical data, for each of sentences of the unstructured medical data: output LDA scores and NMF scores respectively from the LDA model and the NMF model; and performing a softmax function on each of the LDA scores and the NMF scores to respectively generate first standard derivation scores and second standard derivation scores. 14. The non-transitory computer-readable medium of claim 13 , wherein the instructions further cause the at least one processor to control to search for and display at least one of the elastic search clusters. 15. The non-transitory computer-readable medium of claim 13 , wherein

Assignees

Inventors

Classifications

  • Thesaurus · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title

  • Recognition of textual entities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10943673B2 cover?
A method of medical data auto collection segmentation and analysis, includes collecting, from a plurality of sources, unstructured medical data in a plurality of formats, recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary, and performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the u…
Who is the assignee on this patent?
Tencent America LLC
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).