Domain Maddness!

by Owen Densmore


Domain Madness



OK, I admit it, I may have too much time on my hands,
but I was looking for a new DNS domain name, and decided to shove all of
the 234,937 words in /usr/share/dict/words through Whois, collecting those
not having a .com entry. The script is attached below. Currently the words
file lists the words in Webster's Second International, who's 1937 copyright
has expired.


Oddly enough, the result is that 63% of the words are NOT
registered names!! That's right, 147,886 words are not taken. That's the
good news. The bad news is that many of them are pretty weird. I've stashed
these guys, both compressed and clear, on

   
http://backspaces.net/files/NonDNSWords



   
http://backspaces.net/files/NonDNSWords.gz



For example, here's the list of all 43 4-letter words not taken:

grep ^....$ NonDNSWords
bikh fowk hawm koae odso shlu waup yeuk yirn
dird frib hewt kuar oime suld wusp yigh yirr
dowf gawm jaob mowt paut syrt wype yilt yuft
dowp ghuz jewy munj phoh uily yalb yirk
emyd gype jhow niog rynt wauf ycie yirm
I can't see one that calls out to me, really. A lot of these do not appear in my dictionary, but the Second International was known for stretching!


Here's a random sampling of 100 of the 6 letter critters:

grep ^......$ NonDNSWords | ran 100 | column -x
diaene evener tummer burdie palpon coccid taxwax chanst madefy buntal
haggly masted untone dutied unmiry cynips psetta otitic gawcie beflag
midpit orgyia tutory amylic begnaw punlet adigei scrank bedrop lusory
dorize repale unmold snurly scotic unsing uplead hemine unnose stibic
funori cobcab yengee cahita rutuli menkib uptend sassak beflap crants
ocyroe rugose avowry mogdad coecal elleck ptotic kommos amusgo lemosi
avitic amorua cacara ideist reswim napaea reshut egeran lechea emboly
korait uplick baeria kurvey ureido tuchit beroll adroop degged twisel
kechel solate unbare hardim upwaft sullan tineal uramil ovinia pappox
forrad jacami unlean byrlaw thymyl scrobe lyncid crenic bepity anoine
..where "ran" is a simple awk script, below, to randomly select n lines
from a file. Its kinda spooky doing all this on Mac OS X .. it really IS
Unix.

Again, not a lot of love. By the way, there were 5,166 6-letter
words, so I likely have not shown some real winners in this sampling. Let
me know if you find some real winners.

This got me a bit curious .. how do the words work out by size?
I.e. how many words are 6 letters long etc? Time for another script, also
attached below:

/usr/share/dict/words           NonDNSWords
1 52 0.0221 .
2 155 0.0660 .
3 1351 0.5750 .
4 5110 2.1751 4 43 0.0291
5 9987 4.2509 5 1219 0.8243
6 17477 7.4390 6 5166 3.4932
7 23734 10.1023 7 10725 7.2522
8 29926 12.7379 8 16593 11.2201
9 32380 13.7824 9 20861 14.1061
10 30867 13.1384 10 22254 15.0481
11 26011 11.0715 11 20415 13.8046
12 20460 8.7087 12 17065 11.5393
13 14937 6.3579 13 12935 8.7466
14 9763 4.1556 14 8811 5.9580
15 5924 2.5215 15 5433 3.6738
16 3377 1.4374 16 3146 2.1273
17 1813 0.7717 17 1681 1.1367
18 842 0.3584 18 804 0.5437
19 428 0.1822 19 408 0.2759
20 198 0.0843 20 189 0.1278
21 82 0.0349 21 79 0.0534
22 41 0.0175 22 38 0.0257
23 17 0.0072 23 16 0.0108
24 5 0.0021 24 5 0.0034



Well, the most populous part of NonDNSWords is 10; here's a sample:

dramseller floriation tractional clanswoman periphrase
cyrtometer symphytize convolvuli mucigenous clamminess
hyperacute myrtlelike unharbored ergonovine undertided
digressory preclosure parnassism habilatory boycottism
nilometric paralgesic trimacular annelidian breeziness
prelegatee admiringly scatophagy bonebinder morphinism
endosteoma ranivorous undistinct solenodont scathingly
unfreckled unpanelled impalpably unemphatic staverwort
gradientia cystospasm xenocratic cogredient rubescence
neurolytic unrebutted saponacity brachyoura depatriate

OK, I know you want to know the 5 24-letter words, so here they are:


formaldehydesulphoxylate
pathologicopsychological
scientificophilosophical
tetraiodophenolphthalein
thyroparathyroidectomize


..and, yup, antidisestablishmentarianism wasn't there.


This did make it easy to search for substrings of interest. For example, I wanted to
find all the words with "plex" in them. There were 54:

grep plex NonDNSWords | column -x
amplexation amplexicaudate amplexicaul amplexicauline amplexifoliate
autocomplexes cerviciplex complexedness complexionably complexional
complexionally complexioned complexionless complexively complexly
decemplex diaplexal diaplexus epiplexis euplexoptera
ganglioplexus holoplexia intercomplexity kataplexy myelapoplexy
nulliplex overcomplex overcomplexity perplexable perplexedly
perplexedness perplexingly perplexment phantoplex plexicose
pleximeter pleximetric plexodont plexometer plexure
pseudoapoplexy reperplex retroplexed semiamplexicaul semiduplex
sextuplex simplexed supercomplex triplexity ultracomplex
unimultiplex unperplexed unperplexing veniplex

This is a bit more interesting: holoplexia.com sounds nifty, as does nulliplex.com


So, I guess you're wondering which one I took, right? Well, sadly,
none of them. While groveling around, I thought of a two-word critter I kinda
like: ComplexityWorks.com, so hmm..all this was a waste? I think not, but...

Scripts:

Check a list of words w/ whois.
#!/bin/sh
pat=${1:-"^...*"}
start=${2:-a}
file=${3:-/usr/share/dict/words}
words=`sed -n "/^$start/,\\$p" $file | grep $pat -i`
for w in $words ; do
whois $w.com | \
sed -n '/No match for/{s:.*for .::;s:......$::p;}' | \
tr A-Z a-z
done

For choosing N random samples from a stream:
#!/bin/sh
samples=$1
awk -v samples=$samples '
{a[NR]=$0} # Read in file
END {
len=NR
for ( len=NR; samples > 0 && len > 0; samples--) {
i=int(rand()*(len+1))
print a[i]
delete a[i]
len--
}
}'

For sorting a stream by length:
#!/bin/sh
awk '
{a[length]++}
END {
for (i in a) printf "%2i %10i %10.4f \n", i, a[i], 100*a[i]/NR
}' | sort -n









I'm curious: How did you pick *your* domain?!


7 Comments

mentata
2003-01-02 07:19:19
the Dr. Seuss technique
I read a similar article several years ago that painted a bleaker picture because it followed trends to predict that all words in the English language would be used up by some relatively recent sounding year. I tried a few favorite obscure words (I only found out later that Eric Raymond took "thyrsus"), but no luck. Rather than frustrate myself with more whois searches to get something ultimately unsatisfying, I decided to make up a word myself.


The result: mentata.com. I'll bet Tim O'Reilly can tell you where it comes from, but even beyond the concepts, its short, easy to remember, and has kind of a catchy ring to it.


Don't feel guilty, this is important work. Thanks for the list.

crynn
2003-01-02 15:01:33
How disappointing...
www.antidisestablishmentarianism.com is a placeholder for a domain registration company.


I'd be curious to discover how many of those domains that are registered with whois are actually being held for sale...

anonymous2
2003-01-02 16:19:14
Random script didn't work on RedHat 8
In order to make the random script not return the same result everytime, a call to srand() needs to be included to seed the random number generator:


#!/bin/sh
samples=$1
awk -v samples=$samples '
{a[NR]=$0} # Read in file
END {
srand()
len=NR
for ( len=NR; samples > 0 && len > 0; samples--) {
i=int(rand()*(len+1))
print a[i]
delete a[i]
len--
}
}'

Owen
2003-01-02 16:53:13
Random script didn't work on RedHat 8
Thanks! I thought a bit about whether or not I wanted it to be "reproducible" .. i.e. repeat each run. I think I like your approach better than mine, 'cause I can do the same probe several times with different results.
anonymous2
2003-07-01 07:57:48
word crisis
It could be more serious than you think. I wondered how you got 52 1-letter words out of a 26-letter alphabet... the word list includes the cap and lowercase for each letter as a 'word' - further scrutiny revealed there are seperate entries for words that can be capitalized or not, e.g., Bill, bill, Mark, mark, Will, will. No telling how soon the pool will dry up now!


Of course, a 1934 dictionary is going to be shy a few words that have surfaced in the latter part of the last century.

anonymous2
2003-11-03 07:03:17
Do mine maddness
Would you be so kind as tell me if the prenaptualagreementalisticallyminded.com is taken?
anonymous2
2003-11-04 08:35:02
Do mine maddness
yes, but antiprenaptualagreementalisticallyminded.com is still available