Systems and methods for order-of-magnitude viral cascade prediction in social networks
US-10437945-B2 · Oct 8, 2019 · US
US11275900B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11275900-B2 |
| Application number | US-201916405612-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 7, 2019 |
| Priority date | May 9, 2018 |
| Publication date | Mar 15, 2022 |
| Grant date | Mar 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of a computer-implemented system for improving classification of data associated with the deep web or dark net are disclosed.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented system for improving classification of criminal activities arising from cyber environments, comprising: a processor configured to: access a ground truth dataset generated from deep web forum information, the ground truth dataset defining a predetermined tag hierarchy including predetermined tag labels corresponding topics of the deep web forum information; access data related to a discussion topic for classification from a deep web forum; extract a set of features from the data by assigning word vectors or paragraph vectors to the data; and apply a machine classifier to the set of features to generate a prediction list of tags for classifying the discussion topic, wherein the prediction list includes a prediction probability value for each tag of the plurality of tags; and add all parent tags associated with a tag of the plurality of tags to the prediction list based on a comparison between the prediction probability value for the tag and a first predetermined threshold value. 2. The computer-implemented system of claim 1 , wherein a predetermined tag hierarchy is divided into parent tags and child tags, wherein a topic represented by a child tag is representative of a subset of a topic represented by a parent tag, wherein each child tag corresponds to a particular parent tag, and each tag is assigned a corresponding probability value. 3. The computer-implemented system of claim 1 , wherein the processor is configured to remove all child tags of the plurality of tags corresponding to a parent tag from the prediction list of tags where a probability value associated with the child tag is below a second predetermined threshold value. 4. The computer-implemented system of claim 1 , wherein the processor is configured to add all parent tags of the plurality of tags corresponding to a child tag if a probability value associated with the parent tag is above the first predetermined threshold value. 5. The computer-implemented system of claim 1 , wherein the set of features are extracted using textual feature extraction to identify tags associated with topic titles of the data. 6. The computer-implemented system of claim 1 , wherein a set of synthetic data samples are created for addition to the ground truth dataset to address class imbalance by creating a set of synthetic sample points along a line defined between a first minority feature vector and a second feature vector and adding the set of synthetic sample points to the ground truth dataset. 7. The computer-implemented system of claim 1 , wherein the ground truth dataset is supplemented by, for every sample of the ground truth dataset in a minority class, adding top similar documents from the deep web forum information, as identified using an elastic search similarity score, and adding them to the ground truth dataset. 8. The computer-implemented system of claim 1 , wherein the data is parsed and segmented into tokens, and predetermined characters and predetermined common words are removed from the data. 9. The computer-implemented system of claim 1 , wherein words within the data are reduced to their root form using stemming and lemmatization procedures. 10. The computer-implemented system of claim 1 , wherein the set of features are extracted from the data using a neural network, wherein the neural network is configured to learn a continuous distributed vector representation for documents and words within the data. 11. The computer-implemented system of claim 1 , further comprising: a crawler in operable communication with the processor and operable to traverse a plurality of deep web websites to retrieve Internet documents forming at least a portion of the deep web forum information. 12. A method, comprising: configuring a processor for executing operations including: accessing data associated with a deep web forum, the data defining a topic for classification; extracting a set of features from the data as inputs for a machine classifier; and apply a machine classifier to the set of features to generate a prediction list of tags for classifying the topic, wherein the prediction list includes a prediction probability value for each tag of the plurality of tags; and adding all parent tags associated with a tag of the plurality of tags to the prediction list based on a comparison between the prediction probability value for the tag and a first predetermined threshold value. 13. The method of claim 12 , further comprising: outputting an update to the prediction list of tags, a total number of the tags of the prediction list of tags being different from the update to the prediction list of tags to maintain a predetermined tag hierarchy. 14. The method of claim 12 , further comprising generating a predetermined tag hierarchy based on a ground truth dataset, and leveraging semi-supervised learning to address class imbalance and increase accuracy of the machine classifier.
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Semantic analysis · CPC title
Backpropagation, e.g. using gradient descent · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
Parsing markup language streams (streaming G06F40/149) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.