# Near(est) Neighbor Problems in Hamming Space

## 1

Deterministical Randomized Deterministical Randomized
$\lambda$-ENN $\Omega(\frac{d}{\log n})$ for ploy
 $\Omega(d/\log n)$[2] for poly $\Omega(d/\log\frac{sd}{n})$ [6]
$\lambda$-ENNS
ENNS
$\lambda$-ANN
 $\Omega(\frac{d}{\gamma^{3}}/\log\frac{sd}{n})$ [6] $\Omega(\frac{d}{\gamma^{3}}/\log\frac{sw\gamma^{3}}{nd})$ [9]
 $\Theta(1)$ for poly $\omega(1)$[1] for sub-poly
 $\Theta(1)$ for poly $\omega(1)$[1] for sub-poly
$\lambda$-ANNS $\Theta(1)$ for poly
 $\Theta(1)$ for poly LSH. $s=O(dn+n^{1+\rho})$ $t=O(n^{\rho})$ $\rho\leq 1/\gamma$ [4] $\rho=\Theta(1/\gamma)$ [5]
ANNS
 $\Theta(\frac{\log\log d}{\log\log\log d})$[3] for poly $\Omega(\log\log d)$[7] for linear $\Omega(\log n/\gamma^{2}\log\frac{sw}{n})$ [8]
 $\Theta(\frac{\log d}{\log\gamma})$ for poly $\Omega(\log n/\log\frac{swd}{n\log n})$ [8]

## 2 Explanations

Data Structure Problem: Given database $y\in{\left({{\{0,1\}^{d}}\atop{n}}\right)}$ ($n$ points in $d$-dimension hamming space), we are required to construct a data structure (the data structures is considered as a table with $s$ cells, each cell contains at most $w$ bits [10]) such that every query $x\in\{0,1\}^{d}$ (a point in $d$-dimension hamming space) can be answered within $t$ cell-probes (i.e. access at most $t$ cells).
Adaptivity: We call a cell-probing algorithm is fully adaptive if the algorithm makes $t$ rounds cell-probing, one cell-probing per round. We call a cell-probing algorithm is non-adaptive if the algorithm makes only $1$ rounds cell-probing, $t$ cell-probing in total. (i.e. a non-adaptive cell-probing algorithm must decide the cells it want to access once it receive the query, without knowing anything about current database $y$)

## 3 Notations

• poly:

$s=n^{O(1)}$.

• sub-poly:

$s=n^{o(1)}$.

• linear:

$s=O(n)$.

• dist:

Hamming distance.

• $\lambda$-ENN:

Is there a $\lambda$-near neighbor of $x$? The answer is NO if $\neg\exists z\in y(\mathrm{dist}(z,x)\leq\lambda)$, otherwise the answer is YES.

• $\lambda$-ENNS:

Is there a $\lambda$-near neighbor of $x$? Return one if there exists. The answer is NO if $\neg\exists z\in y(\mathrm{dist}(z,x)\leq\lambda)$, otherwise the answer is arbitrary such a $z$.

• ENNS:

Exactly nearest neighbor search. The answer is a $z\in y$ such that $\forall z^{\prime}\in y(\mathrm{dist}(z,x)\leq\mathrm{dist}(z^{\prime},x))$.

• $\lambda$-ANN:

$\gamma$-approximately, is there a $\lambda$-near neighbor of $x$? The answer is NO if $\neg\exists z\in y(\mathrm{dist}(z,x)\leq\gamma\cdot\lambda)$, is YES if $\exists z\in y(\mathrm{dist}(z,x)\leq\lambda)$, otherwise the answer is arbitrary.

• $\lambda$-ANNS:

$\gamma$-approximately, is there a $\lambda$-near neighbor of $x$? Return a $\gamma$-approximate $\lambda$-near neighbor of $x$ if there exists. If $\neg\exists z\in y(\mathrm{dist}(z,x)\leq\gamma\cdot\lambda)$ the answer is NO. if $\exists z\in y(\mathrm{dist}(z,x)\leq\lambda)$ the answer is a $z$ such that $\forall z^{\prime}\in y(\mathrm{dist}(z,x)\leq\gamma\cdot\mathrm{dist}(z^{% \prime},x))$. Otherwise the answer can be either NO or the $z$.

• ANNS:

Approximate nearest neighbor search. The answer is a $z\in y$ such that $\forall z^{\prime}\in y(\mathrm{dist}(z,x)\leq\gamma\cdot\mathrm{dist}(z^{% \prime},x))$.

## References

• 1 Andoni, A., Indyk, P. and PÄtraÅcu, M., 2006, October. On the optimality of the dimensionality reduction method. In Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on (pp. 449-458). IEEE.
• 2 Barkol, O. and Rabani, Y., 2000, May. Tighter bounds for nearest neighbor search and related problems in the cell probe model. In Proceedings of the thirty-second annual ACM symposium on Theory of computing (pp. 388-396). ACM.
• 3 Chakrabarti, A. and Regev, O., 2004, October. An optimal randomised cell probe lower bound for approximate nearest neighbour searching. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on (pp. 473-482). IEEE.
• 4 Indyk, P. and Motwani, R., 1998, May. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing (pp. 604-613). ACM.
• 5 Motwani, R., Naor, A. and Panigrahy, R., 2007. Lower bounds on locality sensitive hashing. SIAM Journal on Discrete Mathematics, 21(4), pp.930-935.
• 6 Pǎtraşcu, M. and Thorup, M., 2006, October. Higher lower bounds for near-neighbor and further rich problems. In Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on (pp. 646-654). IEEE.
• 7 Pǎtraşcu, M. and Thorup, M., 2007, January. Randomization does not help searching predecessors. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 555-564). Society for Industrial and Applied Mathematics.
• 8 Panigrahy, R., Talwar, K. and Wieder, U., 2008, October. A geometric approach to lower bounds for approximate near-neighbor search and partial match. In Foundations of Computer Science, 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on (pp. 414-423). IEEE.
• 9 Wang, Y. and Yin, Y., 2014. Certificates in data structures. In Automata, Languages, and Programming (pp. 1039-1050). Springer Berlin Heidelberg.
• 10 A. C.-C. Yao. Should tables be sorted? Journal of the ACM (JACM), 28(3):615–628, 1981.