Re: Dealing with unicodes in strings


Roland King
 

Don’t really see how NSString would be able to deal with that, it has no way of knowing there are embedded escape sequences in what is otherwise UTF-8. This looks like a fairly standard web encoding of ’special characters’ which javascript is especially fond of. I would just detect \u and treat the next two bytes at a UTF-16 character. You might find an NSString extension or another class which deals with web encodings does it seamlessly but frankly if that’s all there is, I’d just deal with it by hand. 

On 1 Jun 2019, at 08:53, Graham Cox <graham@...> wrote:

Yes, I think it’s putting these 6 characters into the string.

The original data is an HTML page, and these strings come from some embedded javascript on the page - I’m scraping the page to extract specific bits of information, and it generally works OK, except for this minor formatting issue. Though the page declares it is using UTF8 encoding, I’m wondering if that applies even to embedded javascript strings - perhaps they need to be treated as C strings?

I can write some code to deal with it, but it just seems like something NSSString can already do.

—Graham



On 1 Jun 2019, at 12:57 am, Quincey Morris <quinceymorris@...> wrote:

On May 31, 2019, at 07:51 , Glenn L. Austin <glenn@...> wrote:

Is it possible that the code "\u002D" is in the string as the six characters? It is just the minus sign, but could the source have encoded certain characters so they wouldn't be accidentally interpreted?

That’s what I was thinking too.

This would easily be resolved if we could see the bytes of the NSData in hex.



Join cocoa@apple-dev.groups.io to automatically receive all group messages.