Smart quotes (and more) in Vim and Emacs

by Michael(tm) Smith

Related link: http://jason.diamond.name/weblog/2005/10/20/unicycle-script-for-vim



Contents




A while back I tried out a Unicode-aware editor called
Mined. I found out about it from reading
Ed Trager's A Quick Primer On Unicode and Software Internationalization Under Linux and UNIX, which I wrote a short item about earlier.



One thing I really liked about Mined was its support for automatically inserting “smart quotes” while you type. At the time I saw it, I thought: Hey, wouldn’t it be nice if my main editor of choice, Vim, let me do that? Well, now it does…



UniCycling in Vim



A few days ago, Jason Diamond posted an entry on his blog about a new little Vim script he wrote called UniCycle. Here’s his description of it:



It’s called UniCycle because it cycles through different unicode characters as you’re typing them. It’s similar to the “Smart Quote” feature in Word except it’s easier to get back to a dumb quote if that’s what you really want: just hit the quote key again and it’ll cycle to the next character.


It works with hyphens (turning them into en and em dashes), periods (turning them into horizontal ellipses), apostrophes (turning them into left or right single quotation marks), and quotes (turning them into left or right double quotation marks).


How to install UniCycle


The script now has its own page at the Vim.org site and you can download it from there. As with other Vim scripts, you can install it just by dropping it into your Vim plugins directory; that directory is ~/.vim/plugin by default; if you don’t know if you have such a directory, you don’t really need to bother to check, because you can just do a quick install like this:



>mkdir -p ~/.vim/plugin && \
cd ~/.vim/plugin && \
wget -O unicycle.vim \
http://www.vim.org/scripts/download_script.php?src_id=4689


Some short (optional) config


There are no additional install or config steps required, because once you’ve installed UniCycle in your ~/.vim/plugin directory, vim will load it automatically each time it starts up. That said, though, there are a couple things you might want to add to your ~/.vimrc file to make UniCycle work better.

>
" Turn UniCycle on by default for all XML and XSLT files
autocmd FileType xml,xslt UniCycleOn
" make the vim command-line 2 lines high so that we can see secret
" messages emitted by UniCycle
set cmdheight=2


As far as the cmdheight=2 part, I’ll say more about that in a minute.



Make sure vim starts in a UTF-8 environment


Before you start up vim and give UniCycle a try, make sure to launch Vim in a UTF-8-ready way. Otherwise, it‘s not going to work they way you would expect.



There are a couple of ways to launch vim in a UTF- 8-ready way:



A. Gvim way

Run gvim instead of vim, and start it up like this:



>LC_CTYPE=en_US.UTF-8 gvim


That will launch Gvim in a separate X-Window and you’ll be all ready to go.



B. Unicode X-terminal way

Start up a Unicode-enabled terminal such as mlterm or xterm and then run the vim command there.



>LC_CTYPE=en_US.UTF-8 xterm -u8 -fn \
'-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1'


Using UniCycle

After you’ve started vim or gvim, open a new or existing *.xml or *.xsl file, hit i to get into insert mode, and type a quotation-mark (") character. If you did the Some short (optional) config step above, you should now see a curly left quotation mark, and a message in the vim “command-line” (a the bottom of the frame) saying “LEFT DOUBLE QUOTATION MARK” (that’s what the cmdheight=2 line in your ~/.vimrc file is for; it expands the command line so that you can see these messages). Hit " again, and you’ll see a message saying just “QUOTATION MARK”. Hit it one more time, and you’ll see “RIGHT DOUBLE QUOTATION MARK”.



If that all works for you as expected, try typing an apostrophe or dash. You’ll see that vim (uni)cycles through character choices for them just as it does for the quotation mark. Then try typing three dot/period characters in a row, and you’ll see vim replace them with a real ellipses chararacter.



Troubleshooting


If you didn’t do the Some short (optional) config step above or if you did but you’re still not seeing the behavior described above, manually type the :UniCycleOn and :set cmdheight=2 commands and then try again.



You should now see the “LEFT DOUBLE QUOTATION MARK” message. But if you see weird boxes or spaces or garbage characters where you’re expecting to see curly quotation marks, then it probably means you are not actually running vim in UTF-8-ready way.



And if you see the message, but the quotation marks that appear don’t look so curly, it probably just means your default font doesn’t have good glyphs for curly quotes. So either try switching to a different font in your X-terminal; or if you are using Gvim, change the font by typing :set guifont=Monospace\ 13 (or whatever font and size you want to try).



If it all works out, you’ll end with an easy way to type curly quotes and em/en dashes and ellipses in docs you edit in Vim. If it doesn’t, well, you can always consider switching to Emacs. :)



XMLUnicoding in Emacs



Vim is my main editor of choice, but there are some things for which Emacs currently provides a better editing environment. For example, there currently is no way to do context-sensitive validated editing in Vim. But there is a way to do it in Emacs. A very good way: using James Clark’s nXML mode.



nXML is a mighty piece of work. It’s hard to imagine now how I ever did any XML editing without it. As good as it is, though, when I first started using it to edit UTF-8-encoded documents, I found myself thinking: Hey, now that I can actually work with a document format that allows real (Unicode) special characters (instead of just some ascii escape code or entity or whatever for representing those characters), wouldn’t it be great if I had an easy way to actually directly enter those special characters ― especially characters for curly quotes and em/en dashes?



Enter XMLUnicode


At the same time I was just sitting around dreaming about it, Norm Walsh was actually doing something about it; namely, cooking up something in Emacs lisp to make it work. The result is a package he named XMLUnicode.



Around the time when Norm released XMLUnicode, he also wrote up a blog entry about it, describing the variety of ways it gives you to enter special characters.



But to describe it briefly: it lets you enter smart quotes, em/en dashes, and ellipses in a way very similar to what UniCycle does, plus more.



(In fact, I guess that it’s a little odd to describe it that way, since it was around for quite a while before UniCycle and was actually, I believe, a big part of the inspiration for UniCycle.)

How to install XMLUnicode (and nXML mode)


Before installing XMLUnicode, you’ll probably first want to install nXML. It may already be packaged for your distro, so check first. For example, on a Debian system, you can install it with this command:



>sudo apt-get install nxml-mode


To install it manually, you need to put it somewhere in your Emacs load path. If you have root access on the system where you want to install it, the appropriate place is probably /usr/local/share/emacs/site-lisp. So do something like this:



>cd /usr/local/share/emacs/site-lisp/ && \
sudo wget \
http://www.thaiopensource.com/download/nxml-mode-20041004.tar.gz && \
sudo tar xvfz nxml-mode-20041004.tar.gz


Install XMLUnicode itself with a similar set of commands:



>cd  /usr/local/share/emacs/site-lisp && \
sudo wget http://nwalsh.com/emacs/xmlchars/xmlunicode.el && \
sudo wget http://nwalsh.com/emacs/xmlchars/unichars.el


Getting nXML and XMLUnicode set up and available within Emacs takes a little more work than getting UniCycle working in Vim, but not too much more.



Some (non-optional) configuration


To configure nXML, and to configure XMLUnicode for use within nXML mode, add the following to your .emacs startup file.



>;;; nxml setup
;; load autoloads for nXML mode
(load "rng-auto.el")
;; auto-start nXML mode for *.xml and *.xsl files
(setq auto-mode-alist
(append (list (cons "\\.xml\\'" 'nxml-mode)) auto-mode-alist))
(setq auto-mode-alist
(append (list (cons "\\.xsl\\'" 'nxml-mode)) auto-mode-alist))
;;; end of nXML setup

;;; xml-unicode.el setup
;; The xml-unicode.el code relies on some Common Lisp functions,
;; so you need to make sure the the Common Lisp package is loaded
;; before loading xml-unicode.el
(require 'cl)
;; location where unichars.el file is installed; needs to be
;; specified before xmlunicode is loaded
(setq unicode-character-list-file
"/usr/local/share/emacs/site-lisp/unichars.el")
(load "xmlunicode")
;; Set up xmlunicode for use within nXML mode
(defun bind-nxml-mode-keys ()
(set-language-environment "utf-8")
(define-key nxml-mode-map "\"" 'unicode-smart-double-quote)
(define-key nxml-mode-map "\'" 'unicode-smart-single-quote)
(define-key nxml-mode-map "\-" 'unicode-smart-hyphen)
(define-key nxml-mode-map "\." 'unicode-smart-period)
;; display UniChar menu when in nXML mode
(define-key nxml-mode-map [menu-bar unichar]
(cons "UniChar" unicode-character-menu-map))
;; set input method to "xml" (xmlunicode) when in nXML mode
(set-input-method 'xml))
;;; End of xmlunicode setup


Using XMLUnicode

After you’ve started Emacs, visit a new or existing file with a .xml extension (foo.xml or whatever), and type a quotation-mark (") character. You should now see a curly left quotation mark. Hit " again, and you’ll see a regular straight quotation mark. Hit it one more time, and you’ll see a curly right quotation mark.



If that all works for you as expected, try typing an apostrophe or dash. you’ll see that Emacs cycles through character choices for them just as it does for the quotation mark. Then try typing three dot/period characters in a row, and you’ll see Emacs turn replace them a real ellipses chararacter.



You’ll also notice that your Emacs now has a UniChar menu that you can use to insert a variety of other special characters. And that’s not the only additional feature that XMLUnicode provides for inserting special characters ― read the docs for it to find out more.



Troubleshooting


If the quotation marks that appear don’t look so curly, it probably just means your default font doesn’t have good glyphs for curly quotes. So try switching to a different font in your Emacs.



If it all works out, you’ll end with an easy way to type curly-quotes and em/en dashes and ellipses in any UTF-8-encoded docs you want to edit in Emacs ― and also a menu and some additional commands for easily adding other special characters. If it doesn’t work out, well, you can always consider switching to Vim and using UniCycle. :)




Other methods for entering special characters in your favorite text editor?


3 Comments

aristotle
2005-10-24 06:38:27
Re:

Nice tip; the optional configuration step can be improved thusly:


First, since you’re not composing a command string, there’s no need to take the detour via :exe, you can invoke the command directly. But the user command is itself just a wrapper around call UniCycleOn(), so that’s what I’d use in a script.


Second, your autocommands turn it on when encoding is set to UTF-8, but don’t turn it off again when it’s changed to something else.


So the better version is


autocmd BufRead * if &encoding == "utf-8" | call UniCycleOn() | else | call UniCycleOff() | endif
autocmd EncodingChanged * if &encoding == "utf-8" | call UniCycleOn() | else | call UniCycleOff() | endif


Thanks for the pointers, this is an extremely cool plugin!

aristotle
2005-10-24 06:39:05
Re:

Nice tip; the optional configuration step can be improved thusly:

First, since you’re not composing a command string, there’s no need to take the detour via :exe, you can invoke the command directly. But the user command is itself just a wrapper around call UniCycleOn(), so that’s what I’d use in a script.

Second, your autocommands turn it on when encoding is set to UTF-8, but don’t turn it off again when it’s changed to something else.

So the better version is

autocmd BufRead * if &encoding == "utf-8" | call UniCycleOn() | else | call UniCycleOff() | endif
autocmd EncodingChanged * if &encoding == "utf-8" | call UniCycleOn() | else | call UniCycleOff() | endif

Thanks for the pointers, this is an extremely cool plugin!

sideshowbarker.net
2005-10-24 07:55:05
Re: vimrc snippet
I went ahead and changed the vimrc snippet to instead show how to use FileType identfication instead of doing it based on the encoding option. I think that is probably a better way to do it.