Date
1 - 5 of 5
Parsing Javascript simply
Graham Cox
Hi all,
I made an app that scrapes web pages looking for a specific tag - namely, the <video> tag to get the address of a video stream. If you display the page in e.g Safari, the video portion can be right-clicked and it gives you a “Copy Video Address” menu that extracts the URL. My app is intended to obtain that same URL. On some sites, the video address is generated by some obfuscating Javascript rather than simply embedded in the HTML of the page. Nevertheless once displayed, Safari still allows you to manually copy the video stream’s URL, so the obfuscation doesn’t provide any real security - it just makes my page scraping effort more difficult. My question is, is there a way to run the Javascript using some built-in classes to get the final page rendering, so that the video address can be obtained? I don’t want to write my own Javascript parser, that’s crazy. Besides, I don’t know Javascript very well, so I’d rather just leave it to some existing code, then make use of its output. Is this possible, or what strategy can I use? —Graham |
|
Quincey Morris
On Dec 2, 2019, at 17:35 , Graham Cox <graham@...> wrote:
I tried something similar recently, and got better-than-anticipated results using this NSAttributedString initializer: IIRC, this gave me an attributed string with (Javascript-generated) links embedded, so you might have to scan for ”a” tags instead of “video” tags. I dunno, it might not work for you, but it should be simple to try. |
|
Sandor Szatmari
Did you try loading the page in a WebView? Then you should be able to traverse the DOM.
toggle quoted message
Show quoted text
Sandor On Dec 2, 2019, at 20:36, Graham Cox <graham@...> wrote: |
|
Graham Cox
Thanks - indeed simple to try, but unfortunately doesn’t work in this case.
toggle quoted message
Show quoted text
I’ll try loading it into an off-screen WKWebView. —Graham
|
|
Sandor Szatmari
We use a WebView/WKWebView for accessing web page DOMs using xPath queries. We used to be able to even interact with Java Applets, but security improvements have killed that functionality after 10.6. And Applets are a dead technology for the most part. We still support 10.6 to interact with legacy web pages. Please, I hope they go away! But a great deal can still be accomplished this way. Using the delegates will let you know when the page loads are complete and the DOM is in it’s final state. Make sure to query after that to ensure the javascript is loaded/interpreted.
toggle quoted message
Show quoted text
Sandor On Dec 3, 2019, at 17:48, Graham Cox <graham@...> wrote:
|
|