How to get DOMDocument from WkWebView


Gerriet M. Denkmann
 

WebView is sort of deprecated.
“In apps that run in OS X 10.10 and later, use the WKWebView class instead of using WebView”

But: how can I get the DOMDocument of a WkWebView?

I navigate (using WebView) the DOM hierarchy to get the interesting stuff.

Gerriet.


Keary Suska
 

In short, you can’t. WkWebView does not permit direct access to the DOM or expose how the DOM is parsed internally. You have some limited access by being able to inject JavaScript and execute JavaScript on the DOM, as well as some ability to trap events, but that’s it, AFAIK.

Keary Suska
Esoteritech, Inc.
"Demystifying technology for your home or business"

On Jul 7, 2018, at 9:56 AM, Gerriet M. Denkmann <g@mdenkmann.de> wrote:

WebView is sort of deprecated.
“In apps that run in OS X 10.10 and later, use the WKWebView class instead of using WebView”

But: how can I get the DOMDocument of a WkWebView?

I navigate (using WebView) the DOM hierarchy to get the interesting stuff.


 



On Jul 7, 2018, at 8:56 AM, Gerriet M. Denkmann <g@...> wrote:

But: how can I get the DOMDocument of a WkWebView?
I navigate (using WebView) the DOM hierarchy to get the interesting stuff.

I do the same thing (in an app I've been tinkering with for years), but it's a dead end. WkWebView runs the WebView in a separate process, for security reasons, which means its DOM objects are completely inaccessible.

The best workaround seems to be to write the DOM stuff in JavaScript, and use the hooks Keary mentioned to communicate messages back and forth to the native app. In other words, it's a lot like a modern web app, except using those hooks instead of XHR.

—Jens


Gerriet M. Denkmann
 

On 9 Jul 2018, at 23:41, Jens Alfke <jens@mooseyard.com> wrote:

On Jul 7, 2018, at 8:56 AM, Gerriet M. Denkmann <g@mdenkmann.de> wrote:

But: how can I get the DOMDocument of a WkWebView?
I navigate (using WebView) the DOM hierarchy to get the interesting stuff.
I do the same thing (in an app I've been tinkering with for years), but it's a dead end. WkWebView runs the WebView in a separate process, for security reasons, which means its DOM objects are completely inaccessible.

The best workaround seems to be to write the DOM stuff in JavaScript, and use the hooks Keary mentioned to communicate messages back and forth to the native app. In other words, it's a lot like a modern web app, except using those hooks instead of XHR.

—Jens
If the web-page I am interested in is *not* under my control, could I still "write the DOM stuff in JavaScript” ?

If so: how would I start doing this?

Currently I am doing stuff like:

get webString from web via URLSession
webView.loadHTMLString(webString, baseURL: nil)

in webView(_ sender: WebView!, didFinishLoadFor frame: WebFrame!) then do:
curr = domDocument
curr = child of curr which is DOMHTMLHtmlElement
curr = child of curr which is DOMHTMLBodyElement
curr = child of curr which is DOMHTMLDivElement and has idName = “something”
curr = child of curr which is DOMHTMLElement and has tagName = “MAIN”
etc.
htmlString = curr.innerHTML
add some prefix and postfix to htmlString
wkWebView.loadHTMLString(htmlString, baseURL: nil)

Gerriet.


Keary Suska
 

AFAIK, there are no documented limits and I believe you aren’t subject to a number of same-origin policies, but this is Apple so who knows. Anyway, you inject JavaScripts using a WKUserContentController set up as part of the WkWebView configuration. Scripts can return limited information (see WKScriptMessage), and you may need to experiment with returning more complex data, but theoretically you could return a dictionary that represents a DOM tree but you will have to parse it yourself. There are third-party libraries you can use to help parse the DOM. You’ll just want to make sure they are “flat” (i.e. single-file) so you avoid same-origin policy issues.

Keary Suska
Esoteritech, Inc.
"Demystifying technology for your home or business"

On Jul 10, 2018, at 12:10 AM, Gerriet M. Denkmann <g@mdenkmann.de> wrote:


On 9 Jul 2018, at 23:41, Jens Alfke <jens@mooseyard.com> wrote:

On Jul 7, 2018, at 8:56 AM, Gerriet M. Denkmann <g@mdenkmann.de> wrote:

But: how can I get the DOMDocument of a WkWebView?
I navigate (using WebView) the DOM hierarchy to get the interesting stuff.
I do the same thing (in an app I've been tinkering with for years), but it's a dead end. WkWebView runs the WebView in a separate process, for security reasons, which means its DOM objects are completely inaccessible.

The best workaround seems to be to write the DOM stuff in JavaScript, and use the hooks Keary mentioned to communicate messages back and forth to the native app. In other words, it's a lot like a modern web app, except using those hooks instead of XHR.

—Jens
If the web-page I am interested in is *not* under my control, could I still "write the DOM stuff in JavaScript” ?

If so: how would I start doing this?

Currently I am doing stuff like:

get webString from web via URLSession
webView.loadHTMLString(webString, baseURL: nil)

in webView(_ sender: WebView!, didFinishLoadFor frame: WebFrame!) then do:
curr = domDocument
curr = child of curr which is DOMHTMLHtmlElement
curr = child of curr which is DOMHTMLBodyElement
curr = child of curr which is DOMHTMLDivElement and has idName = “something”
curr = child of curr which is DOMHTMLElement and has tagName = “MAIN”
etc.
htmlString = curr.innerHTML
add some prefix and postfix to htmlString
wkWebView.loadHTMLString(htmlString, baseURL: nil)

Gerriet.