5 Commits

Author SHA1 Message Date
Jacob Harris
3b5507ad07 Ooops, slight mistake on the contraction handling 2013-08-17 21:41:45 -04:00
Jacob Harris
b190084a02 Tweaking sentence tokenization, support for contractions 2013-08-17 21:38:14 -04:00
Sebastian Delmont
9620e24416 Use UNICODE-compatible regular expressions.
Even though english was enough for the lord to write the bible, it's still a smart idea to allow for UNICODE characters if only to allow horses to coöperate in a way the New Yorker would approve.

See:
http://www.ruby-doc.org/core-1.9.3/Regexp.html#label-Character+Properties 
and 
http://www.newyorker.com/online/blogs/culture/2012/04/the-curse-of-the-diaeresis.html
2013-08-10 07:47:00 -04:00
Jacob Harris
ca71d20d80 Handle curly apostrophe correctly 2013-07-11 07:49:03 -04:00
Jacob Harris
c201b07a60 Fixes for launch 2013-07-09 21:54:44 -04:00