Find words with the most anagrams efficiently using python
Following my previous post about 9 letter anagrams I am posting the final code I have created taking into account suggestions/snippets from Michael, Toby and Martin. Added two variables to make it nice and easy to modify what to look for.
Code
# -*- coding: utf-8 -*- from time import time from collections import defaultdict ag_len = 10 # Anagram word length ag_min = 2 # Min # of anagrams dictionary_path = '/usr/share/dict/british-english' tic = time() wd = defaultdict(set) for l in open (dictionary_path, 'r'): l=l.strip() if ag_len==len(l): wd["".join(sorted(l))].add (l) for ws, wl in wd.iteritems(): if len ( wl ) >= ag_min: print " ".join ( wl ) toc = time() print toc-tic,'s'
Explanation
The dictionary file is filtered by length into a dictionary. The key for the dictionary is the letter of the word sorted in order, IE:
"".join(sorted('arranging')) = 'aagginnrr'
With the value as the unsorted word. Because words that are an anagram of each other will be identical when sorted this means that using the add method with a dictionary will cause any anagram to share the same key. Eg:
When the dictionary gets to megatons it will create a new key in the dicitonary like so: {'aegmnost': set(['megatons'])} Then to magnetos {'aegmnost': set(['magnetos', 'megatons'])} Then to montages: {'aegmnost': set(['magnetos', 'megatons', 'montages'])}
Then we loop over all the items in the dictionary we created and see if the length of the values is greater than the minimum value we are looking for.
All done, a very elegant and simple method to find words with several anagrams for a given word length.
Results
I was going to post the interesting 10 letter anagrams I found however I couldn’t find any with more than 2 anagrams with the dictionary I was using.
There is a 11 letter tripple anagram:
anthologies anthologise theologians
and some 8 letter with 4 or more anagrams:
painters pertains pantries repaints resident nerdiest inserted trendies salesmen lameness nameless maleness strainer restrain terrains retrains trainers altering triangle relating integral alerting rangiest ingrates angriest gantries parroted predator teardrop prorated iterates teariest treatise treaties trounces counters recounts construe