Unicode Normalise a String in Rails
Ruby on RailsOne strange issue I had on Scribbles, especially with creating a page/post url based on the title, was that sometimes someone would use... wait for it... full-width characters. Huh? What?
Yes, I never encountered this... EVER. Here is a full-width page title: About
Now that actually is just "About" but as "full-width" characters, and that poses a problem.
When Scribbles saves a post or page, it'll go ahead and try and
parameterize
the page/post title. Here is a an example of my method:
if self.title.present? url = self.title.parameterize else url = "\#{self.published_date.strftime("%Y-%m-%d")}" end
The problem with full-width characters is that they're not standard, so
calling parameterize
would just return empty — and thus just throw
errors like there is no tomorrow. Not really, but having an empty url
kinda defeats the purpose of create a URL.
Thankfully Rails has a built in string helper called
unicode_normalize
. You can see what it does
here in the docs.
So now I do the following:
url = self.title.unicode_normalize(:nfkc).parameterize.downcase
You can see here I also added :nfkc
which applies compatibility
decomposition, followed by canonical composition. No idea what that
means, but Rails suggested it as it threw an error in my initial
attempt. I also added a downcase
just to make sure everything is lower
case.
Anyway, this was interesting to figure out.