Systems and methods for segmenting documents
US-2020134024-A1 · Apr 30, 2020 · US
US10943673B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10943673-B2 |
| Application number | US-201916379992-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 10, 2019 |
| Priority date | Apr 10, 2019 |
| Publication date | Mar 9, 2021 |
| Grant date | Mar 9, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of medical data auto collection segmentation and analysis, includes collecting, from a plurality of sources, unstructured medical data in a plurality of formats, recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary, and performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic. The method further includes generating, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups, and indexing the structured medical data into elastic search clusters.
Opening claim text (preview).
What is claimed is: 1. A method of medical data auto collection segmentation and analysis, the method comprising: collecting, from a plurality of sources, unstructured medical data in a plurality of formats; recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary; performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic; generating, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups; and indexing the structured medical data into elastic search clusters, wherein the performing the semantic text segmentation comprises: training a latent Dirichlet allocation (LDA) model and a non-negative matrix factorization (NMF) model, using the unstructured medical data; and for each of sentences of the unstructured medical data: outputting LDA scores and NMF scores respectively from the LDA model and the NMF model; and performing a softmax function on each of the LDA scores and the NMF scores to respectively generate first standard derivation scores and second standard derivation scores. 2. The method of claim 1 , further comprising controlling to search for and display at least one of the elastic search clusters. 3. The method of claim 1 , further comprising generating the medical dictionary, using the unstructured medical data. 4. The method of claim 1 , wherein the performing the semantic text segmentation further comprises, for each of sentences of the unstructured medical data: summing the first standard derivation scores and the second standard derivation scores; averaging the first standard derivation scores and the second standard derivation scores that are summed, to determine an average score; and determining a topic of a respective one of the sentences, based on the average score. 5. The method of claim 1 , further comprising generating a hieratical tree structure of metadata of each piece of the unstructured medical data, wherein the indexing the structured medical data comprises indexing the structured medical data into the elastic search clusters, using the hieratical tree structure of metadata of each piece of the unstructured medical data. 6. The method of claim 1 , wherein the unstructured medical data comprises any one or any combination of medical books, diagnosis cases, forum discussions and medical papers, from the Internet. 7. An apparatus for medical data auto collection segmentation and analysis, the apparatus comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code including: collecting code configured to cause the at least one processor to collect, from a plurality of sources, unstructured medical data in a plurality of formats; recognizing code configured to cause the at least one processor to recognize a medical name entity of each piece of the unstructured medical data, using a medical dictionary; performing code configured to cause the at least one processor to perform semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic; first generating code configured to cause the at least one processor to generate, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups; and indexing code configured to cause the at least one processor to index the structured medical data into elastic search clusters) wherein the performing code is further configured to cause the at least one processor to: train a latent Dirichlet allocation (LDA) model and a non-negative matrix factorization (NMF) model, using the unstructured medical data; and for each of sentences of the unstructured medical data: output LDA scores and NMF scores respectively from the LDA model and the NMF model; and performing a softmax function on each of the LDA scores and the NMF scores to respectively generate first standard derivation scores and second standard derivation scores. 8. The apparatus of claim 7 , further comprising controlling code configured to cause the at least one processor to control to search for and display at least one of the elastic search clusters. 9. The apparatus of claim 7 , further comprising second generating code configured to cause the at least one processor to generate the medical dictionary, using the unstructured medical data. 10. The apparatus of claim 7 , wherein the performing code is further configured to cause the at least one processor to, for each of sentences of the unstructured medical data: sum the first standard derivation scores and the second standard derivation scores; average the first standard derivation scores and the second standard derivation scores that are summed, to determine an average score; and determine a topic of a respective one of the sentences, based on the average score. 11. The apparatus of claim 7 , further comprising second generating code configured to cause the at least one processor to generate a hieratical tree structure of metadata of each piece of the unstructured medical data, wherein the indexing code is further configured to cause the at least one processor to index the structured medical data into the elastic search clusters, using the hieratical tree structure of metadata of each piece of the unstructured medical data. 12. The apparatus of claim 7 , wherein the unstructured medical data comprises any one or any combination of medical books, diagnosis cases, forum discussions and medical papers, from the Internet. 13. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor of a device, cause the at least one processor to: collect, from a plurality of sources, unstructured medical data in a plurality of formats; recognize a medical name entity of each piece of the unstructured medical data, using a medical dictionary; perform semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic; generate, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups; and index the structured medical data into elastic search clusters, wherein the instructions further cause the at least one processor to: train a latent Dirichlet allocation (LDA) model and a non-negative matrix factorization (NMF) model, using the unstructured medical data, for each of sentences of the unstructured medical data: output LDA scores and NMF scores respectively from the LDA model and the NMF model; and performing a softmax function on each of the LDA scores and the NMF scores to respectively generate first standard derivation scores and second standard derivation scores. 14. The non-transitory computer-readable medium of claim 13 , wherein the instructions further cause the at least one processor to control to search for and display at least one of the elastic search clusters. 15. The non-transitory computer-readable medium of claim 13 , wherein
Thesaurus · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Clustering; Classification · CPC title
Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title
Recognition of textual entities · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.