Re: Dealing with unicodes in strings


Graham Cox
 

Yes, I think it’s putting these 6 characters into the string.

The original data is an HTML page, and these strings come from some embedded javascript on the page - I’m scraping the page to extract specific bits of information, and it generally works OK, except for this minor formatting issue. Though the page declares it is using UTF8 encoding, I’m wondering if that applies even to embedded javascript strings - perhaps they need to be treated as C strings?

I can write some code to deal with it, but it just seems like something NSSString can already do.

—Graham



On 1 Jun 2019, at 12:57 am, Quincey Morris <quinceymorris@...> wrote:

On May 31, 2019, at 07:51 , Glenn L. Austin <glenn@...> wrote:

Is it possible that the code "\u002D" is in the string as the six characters? It is just the minus sign, but could the source have encoded certain characters so they wouldn't be accidentally interpreted?

That’s what I was thinking too.

This would easily be resolved if we could see the bytes of the NSData in hex.


Join cocoa@apple-dev.groups.io to automatically receive all group messages.