User:Jeblad/Detect text segments with a convolving hash

Detecting text segments with a convolving hash is an attempt to describe an algorithm for text synchronization.

Assume there is a text $$A$$ and a smaller segment $$B$$. Let there be an index $$i$$ into $$B$$, and an offset $$k$$ into $$A$$. Assume there is a hash function $$\operatorname{h}(\cdot)$$ that takes a char (or codepoint) and folds it into an alternate range. Let this range be the length of a wanted digest. Assume the wanted needle digest comes from the smaller segment and is $$\mathbf{X}$$, and the heystack has the digest $$\mathbf{Y}$$. These digests are vectors, but can be reprocessed into binary numbers. For simplicity call the unprocessed digests for needle vector and heystack vector, and likewise the truncated version for a digest.

Calculate the needle digest as

\mathbf{foreach}~\mbox{char}~i~\mbox{in}~B~\mathbf{do}~X_{\operatorname{h}(B_{i})} \leftarrow X_{\operatorname{h}(B_{i})} + 1 $$

Calculate the heystack digest for a window as