Parsing Javascript simply


Sandor Szatmari
 

We use a WebView/WKWebView for accessing web page DOMs using xPath queries.  We used to be able to even interact with Java Applets, but security improvements have killed that functionality after 10.6.  And Applets are a dead technology for the most part.    We still support 10.6 to interact with legacy web pages.  Please, I hope they go away!  But a great deal can still be accomplished this way.  Using the delegates will let you know when the page loads are complete and the DOM is in it’s final state.  Make sure to query after that to ensure the javascript is loaded/interpreted.

Sandor

On Dec 3, 2019, at 17:48, Graham Cox <graham@...> wrote:

Thanks - indeed simple to try, but unfortunately doesn’t work in this case.

I’ll try loading it into an off-screen WKWebView.

—Graham



On 3 Dec 2019, at 12:47 pm, Quincey Morris <quinceymorris@...> wrote:

I tried something similar recently, and got better-than-anticipated results using this NSAttributedString initializer:


IIRC, this gave me an attributed string with (Javascript-generated) links embedded, so you might have to scan for ”a” tags instead of “video” tags. I dunno, it might not work for you, but it should be simple to try.



Graham Cox
 

Thanks - indeed simple to try, but unfortunately doesn’t work in this case.

I’ll try loading it into an off-screen WKWebView.

—Graham



On 3 Dec 2019, at 12:47 pm, Quincey Morris <quinceymorris@...> wrote:

I tried something similar recently, and got better-than-anticipated results using this NSAttributedString initializer:


IIRC, this gave me an attributed string with (Javascript-generated) links embedded, so you might have to scan for ”a” tags instead of “video” tags. I dunno, it might not work for you, but it should be simple to try.



Sandor Szatmari
 

Did you try loading the page in a WebView? Then you should be able to traverse the DOM.

Sandor

On Dec 2, 2019, at 20:36, Graham Cox <graham@mapdiva.com> wrote:

Hi all,

I made an app that scrapes web pages looking for a specific tag - namely, the <video> tag to get the address of a video stream. If you display the page in e.g Safari, the video portion can be right-clicked and it gives you a “Copy Video Address” menu that extracts the URL. My app is intended to obtain that same URL.

On some sites, the video address is generated by some obfuscating Javascript rather than simply embedded in the HTML of the page. Nevertheless once displayed, Safari still allows you to manually copy the video stream’s URL, so the obfuscation doesn’t provide any real security - it just makes my page scraping effort more difficult.

My question is, is there a way to run the Javascript using some built-in classes to get the final page rendering, so that the video address can be obtained? I don’t want to write my own Javascript parser, that’s crazy. Besides, I don’t know Javascript very well, so I’d rather just leave it to some existing code, then make use of its output. Is this possible, or what strategy can I use?

—Graham





Quincey Morris
 

On Dec 2, 2019, at 17:35 , Graham Cox <graham@...> wrote:

My question is, is there a way to run the Javascript using some built-in classes to get the final page rendering, so that the video address can be obtained? I don’t want to write my own Javascript parser, that’s crazy. Besides, I don’t know Javascript very well, so I’d rather just leave it to some existing code, then make use of its output. Is this possible, or what strategy can I use?

I tried something similar recently, and got better-than-anticipated results using this NSAttributedString initializer:


IIRC, this gave me an attributed string with (Javascript-generated) links embedded, so you might have to scan for ”a” tags instead of “video” tags. I dunno, it might not work for you, but it should be simple to try.


Graham Cox
 

Hi all,

I made an app that scrapes web pages looking for a specific tag - namely, the <video> tag to get the address of a video stream. If you display the page in e.g Safari, the video portion can be right-clicked and it gives you a “Copy Video Address” menu that extracts the URL. My app is intended to obtain that same URL.

On some sites, the video address is generated by some obfuscating Javascript rather than simply embedded in the HTML of the page. Nevertheless once displayed, Safari still allows you to manually copy the video stream’s URL, so the obfuscation doesn’t provide any real security - it just makes my page scraping effort more difficult.

My question is, is there a way to run the Javascript using some built-in classes to get the final page rendering, so that the video address can be obtained? I don’t want to write my own Javascript parser, that’s crazy. Besides, I don’t know Javascript very well, so I’d rather just leave it to some existing code, then make use of its output. Is this possible, or what strategy can I use?

—Graham