next up previous
Next: Experiments Up: Fast k nearest neighbour Previous: The k Nearest Neighbour

Simplified k Nearest Neighbour Search

 

In the simplified knn search, two main approaches that are different to Roussopoulos et al.'s method are taken. These can achieve a better performance.

  1. Collect all overlapping MBRs to the query point and expand all the MBRs in each lower level as a larger branchlist. This is performed recursively until the LEAF level.
  2. Avoid calculating MINMAXDIST. Only use MINDIST and the maximum distance value in the knn list for pruning (Roussopoulos et al.'s pruning strategy 3).

   figure100
Figure 3: Overlapping rectangles with a Query Point

For knn search, the first priority MBRs to search for are those with zero MINDIST. The MBRs (A and C in figure 3) overlap the query point, P, as these MBRs contain most of the closest data objects that can be found (unless the query point is very near to the boundary of an MBR), compared to the non-overlapping MBRs. Figure 3 shows a simple example explaining why the overlapping rectangles should be expanded together and a branch-and-bound search performed. If a non-leaf MBR, A, is browsed through first, then a branchlist of leaf MBRs A3, A1, and A2 is created (sorted with MINDIST value) and searched for nearest neighbours. However, sometimes the MBR C is browsed first, as both C and A MBRs have zero MINDIST value. With searching MBR C first, some of the real knn could be found but probably not all of them. Assume C1 and A3 both contain all the real knn; either the selection of A or the selection of C creates extra searches. If A and C are treated together with a branchlist of MBRs as: A3, C1, C2, A1, A2, and C3, then A3 and C1 can be searched in priority, and some of the other MBRs can be pruned by Roussopoulos et al.'s strategy 3 [10]. However, this situation is less likely to happen in randomly distributed databases indexed by R-tree because there is a very small overlap between MBRs and the query point, or there may be none.

In order to obtain a faster knn search method, the MINMAXDIST calculation can be avoided by using MINDIST as the only ordering metric. MINDIST is a less computational measurement than MINMAXDIST and it is very useful. It gives information regarding whether a query point is overlapping with an MBR (MINDIST = 0) or the minimum distance value if it is not overlapping. Roussopoulos et al. also suggested that MINDIST can be used as effectively as MINMAXDIST if the dead space (space in an MBR that contains no data objects or child MBRs) is very small (figure 7 in [10]). Indeed, our view is that the MINDIST can always be effective with or without the MINMAXDIST and no matter what the size of dead space. This is because if there is an MBR which has a MINDIST value smaller than the maximum distance in the nearest neighbours list, then we cannot be confident in considering that no closer child nodes or data objects can be found in the MBR without searching through, despite any factors such as dead space and MINMAXDIST values. The only way to achieve maximum efficiency is if the real minimum distance between the query point and a data object inside an MBR can be obtained without exploring down to leaf nodes. As far as we know, there is no possible way of achieving this in high dimensional space. In fact, the pruning strategy 3 in section 3 can be applied individually everytime a knn list has a new closer object, and still have the same result and efficiency as when combined with the other two strategies. Moreover, we have found that Roussopoulos et al.'s knn search can be made up to 30% faster by using solely MINDIST to prune MBRs instead of using MINDIST and MINMAXDIST.



next up previous
Next: Experiments Up: Fast k nearest neighbour Previous: The k Nearest Neighbour



Joseph Kuan
Wed Jun 3 13:57:27 BST 1998