Method and apparatus for extracting web page content

US9934206B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9934206-B2
Application numberUS-201414341446-A
CountryUS
Kind codeB2
Filing dateJul 25, 2014
Priority dateMar 27, 2013
Publication dateApr 3, 2018
Grant dateApr 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus for extracting web page content are provided herein. An exemplary method can be implemented by a mobile terminal. A request command to open a first web page can be received. Whether a source code contains text content tags can be determined. When the source code corresponding to the first web page contains the text content tags, text content of the first web page enclosed within the text content tags can be extracted by a reader. When the source code does not contain the text content tags, a start position and an end position to indicate the text content of the first web page can be identified in the source code. The text content tags can be respectively added after the start position and before the end position. The text content of the first web page enclosed within the text content tags can then be extracted.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for extracting web page content, implemented by a mobile terminal, comprising: receiving a request command to open a first web page; determining whether a source code corresponding to the first web page contains text content tags; and when the source code corresponding to the first web page is determined to contain the text content tags: extracting text content of the first web page enclosed within the text content tags by a reader; or when the source code corresponding to the first web page is determined not to contain the text content tags: identifying in the source code a start position and an end position to indicate the text content of the first web page; respectively adding the text content tags after the start position and before the end position; and extracting the text content of the first web page enclosed within the text content tags, wherein, the method further comprises: before extracting the text content of the first web page, extracting a title of the text content of the first webpage; and simultaneously displaying the title and a reader button in a browser address bar of the first webpage, wherein the text content of the first web page is extracted in response to the reader button being triggered. 2. The method according to claim 1 , wherein the text content tags include article tags. 3. The method according to claim 2 , wherein the determining of whether the source code corresponding to the first web page contains the text content tags includes: obtaining the source code corresponding to the first web page, when receiving the request command to open the first web page; and determining, by a rendering engine of a browser installed in the mobile terminal, whether the source code contains the article tags. 4. The method according to claim 2 , wherein the extracting of the text content of the first web page enclosed within the text content tags by the reader includes: receiving request information to open the reader; and opening the reader to extract the text content of the first web page enclosed within the article tags. 5. The method according to claim 4 , further including: displaying the extracted text content of the first web page on a second web page, the second web page being different from the first web page. 6. The method according to claim 5 , wherein the text content is displayed on a touch screen, the method further including: when a last page of the text content is displayed, detecting a touch operation for continuously sliding the text content upward by a user; and when the user is detected to be off from the touch screen, generating and displaying a first animation, the first animation including an animation bounced back from releasing of a continuous upward pulling of the last page of the text content. 7. The method according to claim 5 , wherein, after the displaying of the extracted text content of the first web page on the second web page, the method further includes: receiving a request command to return to the reader; and closing the second web page. 8. The method according to claim 7 , wherein: the displaying of the extracted text content of the first web page on the second web page includes sliding upward the second web page displaying the text content of the first web page; and the closing of the second web page includes sliding the second web page downward into the first web page. 9. An apparatus for extracting web page content, comprising: a memory, and a processor coupled to the memory, the processor being configured to: when a request command to open a first web page is received, determine whether a source code corresponding to the first web page contains text content tags; and when the source code corresponding to the first web page is determined to contain the text content tags: extract text content of the first web page enclosed within the text content tags by a reader; or when the source code corresponding to the first web page is determined not to contain the text content tags: identify a start position and an end position to indicate the text content of the first web page in the source code; respectively add the text content tags after the start position and before the end position; and extract the text content of the first web page enclosed within the text content tags, wherein the processor is further configured to: before extracting the text content of the first web page, extract a title of the text content of the first webpage; and simultaneously display the title and a reader button in a browser address bar of the first webpage, wherein the text content of the first web page is extracted in response to the reader button being triggered. 10. The system according to claim 9 , wherein the text content tags include article tags. 11. The system according to claim 10 , wherein the processor is further configured to: obtain the source code corresponding to the first web page, when the request command to open the first web page is received; and determine, by a rendering engine of a browser installed in the mobile terminal, whether the source code contains the text content tags. 12. The system according to claim 10 , wherein the processor is further configured to: receive request information to open the reader; and open the reader to extract the text content of the first web page enclosed within the text content tags. 13. The system according to claim 12 , wherein the processor is further configured to: display the extracted text content of the first web page on a second web page, the second web page being different from the first web page. 14. The system according to claim 13 , wherein the text content is displayed on a touch screen, and the processor is further configured to: detect a touch operation for continuously sliding the text content upward by a user, when a last page of the text content is displayed; and when the user is detected to be off from the touch screen, generate a first animation for displaying, the first animation including an animation bounced back from releasing of a continuous upward pulling of the last page of the text content, display the first animation. 15. The system according to claim 13 , wherein the processor is further configured to: receive a request command to return to the reader; and close the second web page. 16. The system according to claim 15 , wherein the processor is further configured to: slide upward the second web page displaying the text content of the first web page; and slide the second web page downward into the first web page. 17. The method according to claim 1 , wherein simultaneously displaying the title and the reader button in the browser address bar further comprising: displaying, in a first space of the browser address bar, a preset number of characters in the title; and displaying, in a remaining space of the browser address bar, the reader button. 18. The method according to claim 1 , further comprising: when the title is being extracted, obtaining a title tag reference of the first webpage using a document object in an html specification; and calling an innerHTML method of the title tag to obtain the title of the text content of the first webpage. 19. The method according to claim 1 , further comprising: locally saving a standard html page as a reader template page in a browser, a structure of the reader template page including a title portion and a text content portion; and after the title and the text content are

Assignees

Inventors

Classifications

  • Optimising the visualization of content, e.g. distillation of HTML documents · CPC title

  • Interaction techniques to control parameter settings, e.g. interaction with sliders or dials · CPC title

  • Hyperlinking · CPC title

  • Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title

  • using a touch-screen or digitiser, e.g. input of commands through traced gestures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9934206B2 cover?
Methods and apparatus for extracting web page content are provided herein. An exemplary method can be implemented by a mobile terminal. A request command to open a first web page can be received. Whether a source code contains text content tags can be determined. When the source code corresponding to the first web page contains the text content tags, text content of the first web page enclosed …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F17/2247. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).