Word detection without separating spaces


Sandor Szatmari
 

You know how apple apps can turn ‘itwas’ into ‘it was’ as your typing in a text field? Is there an API for this? I’d like to be able to take something like AREALLYLONGSTRING and turn it into ‘a really long string’ or ‘a_really_long_string’ or basically insert whatever separator I want.

Something like -(NSString*)componentsSeparatedByDictionaryWord:(NSString*)string;

Does my magical imaginary API exist? I’d imagine it’s part of autocorrection functionality…

Thanks,
Sandor
Trying hard not to reinvent the wheel


Ben Kennedy
 

On 27 Sep 2018, at 5:21 am, Sandor Szatmari <admin.szatmari.net@...> wrote:

You know how apple apps can turn ‘itwas’ into ‘it was’ as your typing in a text field? Is there an API for this? I’d like to be able to take something like AREALLYLONGSTRING and turn it into ‘a really long string’ or ‘a_really_long_string’ or basically insert whatever separator I want.
Some quick googlage suggests that UITextChecker (iOS) and NSSpellChecker (Mac OS) might fit the bill...?

-ben


Sandor Szatmari
 

Yes, thanks! I have been playing with NSSpellChecker but haven’t been able to configure it with the magic sauce to get it to tokenize the way I am describing.

Sandor

On Sep 27, 2018, at 12:41, Ben Kennedy <ben-groups@...> wrote:

On 27 Sep 2018, at 5:21 am, Sandor Szatmari <admin.szatmari.net@...> wrote:

You know how apple apps can turn ‘itwas’ into ‘it was’ as your typing in a text field? Is there an API for this? I’d like to be able to take something like AREALLYLONGSTRING and turn it into ‘a really long string’ or ‘a_really_long_string’ or basically insert whatever separator I want.
Some quick googlage suggests that UITextChecker (iOS) and NSSpellChecker (Mac OS) might fit the bill...?

-ben




Jeremy Hughes
 

You could try taking the first n characters of a word (where n is 1 to maximum word length) and spell checking that string. When you a string that spell-checks, continue for the next n characters, and so on. If you get to a point where you can’t find a word for the remaining characters, backtrack and look for a longer word in the previous characters.

Some strings might have multiple answers:

findale could be find ale or fin dale

Jeremy


 



On Sep 27, 2018, at 5:21 AM, Sandor Szatmari <admin.szatmari.net@...> wrote:

I’d like to be able to take something like AREALLYLONGSTRING and turn it into ‘a really long string’ or ‘a_really_long_string’ or basically insert whatever separator I want.

NSLinguisticTagger provides a lot of operations like finding word boundaries and identifying parts of speech. I don’t know if it can be made to identify English words without spaces between them.* Apple has added some newer APIs that use machine-learning to analyze natural language, but I can’t remember what the framework is called; maybe it has some support for that?

—Jens

* It does, however, detect word breaks in East Asian languages that don’t put any separator characters between words.