Full Program »
ND-GiST: A novel method for disk-resident k-mer indexing
Several challenges are related to metagenomics, one of which is the data management. A related central concept is k-mer which means a possible subsequence of length k from a DNA (sub)sequence. In this work, the focus is on indexing k-mers and supporting box queries where a query string of length k might have multiple allowed nucleobases per position. A novel index structure: ND-GiST is introduced which has capability to handle box queries. Comparing it with full table scan and the traditional B-tree, the performance results of ND-GiST are encouraging.