In memory prefix searches – TST Vs HashMap

Ternary Search Trie (TST) is an advanced data structure to store simple key-value pairs that lends itself to really fast searches. TST typicall works well when:

  • Search results are significantly less in number compared to total size of the data store.
  • Keys are of String type
  • For example, TST is a very effective solution when searching 1 phone record from a million or when searching 10 phone records based on a prefix match (say first 6 digits of the phone number)

TST does become slow when result set is huge since the response time is directly dependent on number of tree nodes matching the search filter.

More on TST – http://en.wikipedia.org/wiki/Ternary_search_tree

In this blog post, I have a response time comparison between TST and HashMap. Source code used for this comparison is available at – https://sourceforge.net/p/noreo/code/HEAD/tree/quickscan/

How to setup the code

    1. Import the code as Eclipse project
    2. Since we need at least million records to test this code effectively, you can download the same from
    3. Modify the csv file to include only 7 columns as mentioned in com.ivy.datastructures.simpledb.domain.Geoname
    4. Modify input file path in com.ivy.datastructures.simpledb.utils.TestFileHelper

Executing the test
To execute the test, simply run the 2 Junits in TestFileHelper. Both TST and HashMap versions will be executed on the same dataset and for the same search filter. Results of the different search patterns are actually interesting.

Search response times
With a relatively good computer at my disposal, the above tests with various inputs (resulting low to high number of matches) gives some interesting insights into the data store response time.
Comparison

As you can clearly see, TST performance was significantly better than HashMap till the search filter resulted in a smaller set of matches. As the number of matching records grew, HashMap outperformed TST.

Conclusion
For a very good number of cases where you would want to search key-value based data, TST gives a better performance. Also, the fact that is supports prefix based searches makes it an even more efficient data structure.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s