An enhanced dynamic hash TRIE algorithm for lexicon search
Abstract
Information retrieval (IR) is essential to enterprise systems along with growing orders, customers and materials. In this article, an enhanced dynamic hash TRIE (eDH-TRIE) algorithm is proposed that can be used in a lexicon search in Chinese, Japanese and Korean (CJK) segmentation and in URL identification. In particular, the eDH-TRIE algorithm is suitable for Unicode retrieval. The Auto-Array algorithm and Hash-Array algorithm are proposed to handle the auxiliary memory allocation; the former changes its size on demand without redundant restructuring, and the latter replaces linked lists with arrays, saving the overhead of memory. Comparative experiments show that the Auto-Array algorithm and Hash-Array algorithm have better spatial performance; they can be used in a multitude of situations. The eDH-TRIE is evaluated for both speed and storage and compared with the naïve DH-TRIE algorithms. The experiments show that the eDH-TRIE algorithm performs better. These algorithms reduce memory overheads and speed up IR.
- Publication:
-
Enterprise Information Systems
- Pub Date:
- November 2012
- DOI:
- Bibcode:
- 2012EntIS...6..419Y
- Keywords:
-
- machine retrieval;
- hash;
- TRIE;
- uniqueness;
- Java;
- natural language processing;
- intelligent information processing