Disambiguating unrecognized abbreviations in search queries using machine learning
US-2024070178-A1 · Feb 29, 2024 · US
US9111014B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9111014-B1 |
| Application number | US-201213345217-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jan 6, 2012 |
| Priority date | Jan 6, 2012 |
| Publication date | Aug 18, 2015 |
| Grant date | Aug 18, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are various embodiments for a rule builder for data processing. A proper subset of a set of strings is selected. A first user interface is generated that is configured to present the proper subset of the set of strings. The first user interface is further configured to obtain multiple substring selections corresponding to each one of the proper subset of the set of strings. One or more selection patterns are identified based at least in part on the corresponding substring selections. A second user interface is generated that is configured to present the selection patterns for user verification.
Opening claim text (preview).
Therefore, the following is claimed: 1. A non-transitory computer-readable medium embodying a program executable in a computing device, the program comprising: code that, in response to receiving a data set of strings organized into a plurality of rows and a plurality of columns from a client, selects a proper subset of the data set of strings from individual rows of a randomly selected plurality of the plurality of rows corresponding to a predefined plurality of the plurality of columns for the randomly selected plurality of the plurality of rows; code that generates a first user interface configured to present individual strings from the proper subset of the data set of strings; code that sends data encoding the first user interface to the client; code that, in response to receiving a plurality of substring selections from the client by way of the first user interface, identifies a plurality of selection patterns based at least in part on the plurality of substring selections and a set of pattern rules, individual selections of the plurality of substring selections corresponding to individual strings of the proper subset of the data set of strings, individual selection patterns of the plurality of selection patterns being represented separately by at least two of the individual selections of the plurality of substring selections; code that generates a second user interface configured to present a ranked listing of the plurality of selection patterns according to a weight associated with the individual selection patterns of the plurality of selection patterns; code that sends data encoding the second user interface to the client; and code that, in response to obtaining a selection of one of the plurality of selection patterns from the client, processes the data set of strings by applying the one of the plurality of selection patterns to the data set of strings. 2. The non-transitory computer-readable medium of claim 1 , wherein the first user interface and the second user interface comprise network pages. 3. The non-transitory computer-readable medium of claim 1 , wherein at least one of the plurality of selection patterns includes at least one of: a quantity of selected characters, a class of selected characters, a relative substring position within a string, or a substring in common. 4. The non-transitory computer-readable medium of claim 1 , wherein the second user interface is further configured to present a sample application of at least one of the plurality of selection patterns to another proper subset of the data set of strings which is disjoint from the proper subset. 5. The non-transitory computer-readable medium of claim 1 , wherein the second user interface is further configured to present at least one regular expression corresponding to the individual selection patterns. 6. A system, comprising: at least one computing device comprising a processor and a memory; and at least one application executable in the at least one computing device, the at least one application comprising: logic that selects a proper subset of a set of strings, the set of strings being organized into a plurality of rows and a plurality of columns, the proper subset being selected from individual rows of a randomly selected set of the plurality of rows corresponding to a predefined set of the plurality of columns for the randomly selected set of the plurality of rows; logic that generates a first user interface that is configured to present the proper subset of the set of strings, the first user interface being further configured to obtain a plurality of substring selections, individual substring selections of the plurality of substring selections corresponding to individual strings of the proper subset of the set of strings; logic that identifies a plurality of selection patterns based at least in part on the plurality of substring selections and a set of rules, individual selection patterns of the plurality of selection patterns being represented separately by the individual substring selections of the plurality of substring selections; logic that generates a second user interface that is configured to present a ranked listing of the plurality of selection patterns according to a relative weight associated with the individual selection patterns of the plurality of selection patterns; and logic that processes the set of strings by applying a selected selection pattern of the plurality of selection patterns to the set of strings. 7. The system of claim 6 , wherein the at least one application further comprises: logic that sends data encoding the first user interface to a client; and logic that, in response to obtaining the plurality of substring selections from the client, sends data encoding the second user interface to the client. 8. The system of claim 6 , wherein the set of strings is organized into a table having the plurality of rows and the plurality of columns. 9. The system of claim 6 , wherein the second user interface is further configured to present a result of the individual selection patterns being applied to another proper subset of the set of strings, the other proper subset being randomly selected from the set of strings and being disjoint from the proper subset. 10. The system of claim 6 , wherein the second user interface is further configured to present selection code corresponding to the individual selection patterns. 11. The system of claim 10 , wherein the selection code comprises a regular expression. 12. The system of claim 6 , wherein the second user interface includes a component configured to cause another proper subset of the set of strings to be selected and presented in the first user interface. 13. The system of claim 6 , wherein the relative weight has a predetermined component. 14. The system of claim 6 , wherein the relative weight has a dynamic component indicating a relative confidence by the logic that identifies. 15. The system of claim 6 , wherein at least one of the plurality of selection patterns includes a quantity of selected characters. 16. The system of claim 6 , wherein at least one of the plurality of selection patterns includes a class of selected characters. 17. The system of claim 6 , wherein at least one of the plurality of selection patterns includes a relative substring position within a string. 18. The system of claim 6 , wherein at least one of the plurality of selection patterns includes a substring in common. 19. A method, comprising: receiving, via at least one of one or more computing devices, a data set of strings organized into a plurality of rows and a plurality of columns; randomly selecting, via at least one of the one or more computing devices, a proper subset of the data set of strings from at least one predefined column of the plurality of columns; generating, via at least one of the one or more computing devices, a first user interface configured to present individual strings from the proper subset of the data set of strings; receiving, via at least one of the one or more computing devices, a plurality of substring selections corresponding to the individual strings from the proper subset of the data set of strings by way of the first user interface; identifying, via at least one of the one or more computing devices, a plurality of selection patterns based at least in part on the plurality of substring selections and a set of pattern rules, individual selection patterns of the plurality of selection patterns being separately represented in individual substring selections
using system suggestions (G06F16/3325 takes precedence) · CPC title
based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title
by using string matching techniques · CPC title
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Interaction techniques based on graphical user interfaces [GUI] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.