Buy article online - an online subscription or single-article purchase is required to access this article.
Download citation
Download citation
link to html
The search for which k points are closest to a given probe point in a space of N known points, the `k-nearest-neighbor' or `KNN' problem, is a computationally challenging problem of importance in many disciplines, such as the design of numerical databases, analysis of multi-dimensional experimental data sets, multi-particle simulations and data mining. A standard approach is to preprocess the data into a tree and make use of the triangle inequality to prune the search time to the order of the logarithm of N for a single nearest point in a well balanced tree. All known approaches suffer from the `curse of dimensionality', which causes the search to explore many more branches of the tree than one might wish as the dimensionality of the problem increases, driving search times closer to the order of N. Looking for k nearest points can sometimes be done in approximately the time needed to search for one nearest point, but more often it requires k searches because the results are distributed widely. The result is very long search times, especially when the search radius is large and k is large, and individual distance calculations are very expensive, because the same probe-to-data-point distance calculations need to be executed repeatedly as the top of the tree is re-explored. Combining two acceleration techniques was found to improve the search time dramatically: (i) organizing the search into nested searches in non-overlapping annuli of increasing radii, using an estimation of the Hausdorff dimension applicable to this data instance from the results of earlier annuli to help set the radius of the next annulus; and (ii) caching all distance calculations involving the probe point to reduce the cost of repeated use of the same distances. The result of this acceleration in a search of the combined macromolecular and small-molecule data in a combined six-dimensional database of nearly 900 000 entries has been an improvement in the overall time of the searches by one to two orders of magnitude.

Subscribe to Journal of Applied Crystallography

The full text of this article is available to subscribers to the journal.

If you have already registered and are using a computer listed in your registration details, please email support@iucr.org for assistance.

Buy online

You may purchase this article in PDF and/or HTML formats. For purchasers in the European Community who do not have a VAT number, VAT will be added at the local rate. Payments to the IUCr are handled by WorldPay, who will accept payment by credit card in several currencies. To purchase the article, please complete the form below (fields marked * are required), and then click on `Continue'.
E-mail address* 
Repeat e-mail address* 
(for error checking) 

Format*   PDF (US $40)
   HTML (US $40)
   PDF+HTML (US $50)
In order for VAT to be shown for your country javascript needs to be enabled.

VAT number 
(non-UK EC countries only) 
Country* 
 

Terms and conditions of use
Contact us

Follow J. Appl. Cryst.
Sign up for e-alerts
Follow J. Appl. Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds