Someone's Intermediate Representation

0%

All info comes from Manuel’s slides on Lecture 4.

# Intros on Hash Functions

What if we want some high-level of fingerprint that can be built from any data, with a tiny change in data cause radical impact on the whole fingerprint.

## Hash Function Definitions

The following properties are required for hash function $h$:

• Efficiently computed for any input.
• Pre-image resistant: given $y$ it is computationally infeasible to find $x$ such that $h(x) = y$.
• Second pre-image resistant: given $x$, it is computationally infeasible to find $x’\neq x$ with $h(x) = h(x’)$.
• Collision resistant: it is computationally infeasible to find $\langle x,x’\rangle$ with $x\neq x’$ and $h(x) = h(x’)$.

The output of hash function is called message digest.

The collision resistant is the most general of 3 resistance properties.

# DLP Hash Function

Let $p$ prime and $q = \dfrac{p-1}{2}$ is also a prime and choose $\alpha,\beta$ two generator of $U(\mathbb{Z}/p\mathbb{Z})$. Donate $x \in \mathbb{Z}/q^2 \mathbb{Z}$ as $x_0 + x_1 q$ with $0\le x_0,x_1\le q-1$, and define $h = \alpha^{x_0}\beta^{x_1}\pmod p$.

We can prove the previous function $h$ is collision resistant.

## Lemma and Proposal

Lemma: Let $a,b\in\mathbb{Z}$ and $m\in\mathbb{N}\backslash\lbrace 0\rbrace$ and $d = \gcd(a,m)$. Linear congruence $ax \equiv b \pmod m$ has solution iff $d \mid b$. In this case, there are $d$ solutions that are mutually incongruent mod $m$.

Proposal: If $x \neq x’, h(x) = h(x’)$ are known, then $\log_\alpha \beta$ can be easily computed.

## Proof

Since we have $x = x_0+x_1 q$ and $x’ = x_0’ + x_1’q$, we have $\alpha^{x_0 + x_1 a}\equiv \alpha^{x_0’+x_1’ a}\pmod p$ where $a = \log_\alpha \beta$.

By FST we know $x_0 + x_1 a \equiv x_0’ + x_1’ a \pmod {(p-1)}$, we have $a(x_1 - x_1’) \equiv (x_0’ - x_0)\pmod {(p-1)}$.

Suppose $x_1 - x_1’ \neq 0$, then $d = \gcd(x_1 - x_1’,p-1)$ solutions for $a$ here. But as $q = \dfrac{p-1}{2}$ is prime, only $1,2,q,p-1$ as factor.

Since $x_1 - x_1’ \in \lbrack -(q-1),(q-1)\rbrack$, so $d$ can only be 1 or 2. Thus at most 2 solutions should be tested to determine $a$, hence finding $x\neq x’,h(x) = h(x’)$ implies solving DLP.

# Birthday Attack

The essence of the birthday attack can be expressed by considering birthday of 23 people. If all of them have different birthdays, then the possibility of having such people is $\prod\limits_{i=1}^{22}\dfrac{365 - i}{365}$, close to $\dfrac{1}{2}$.

The possibility of at least 2 sharing same birthday is $1 - \prod\limits_{i=1}^{22}\dfrac{365 - i}{365} = 0.507 > \dfrac{1}{2}$.

In a more general case, if there are $n$ possible birthdays and $r$ test inputs, we can have the result on birthday collision possibility.

First, $-x-x^2 \le \ln(1-x)\le -x$. Then when $r \le \dfrac{n}{2}$, we can add $x = \dfrac{j}{n}$ with $j \in \lbrack 1, r-1\rbrack$. In this way $-\dfrac{(r-1)r}{2n}-\dfrac{r^3}{3n^2}\le \sum\limits^{r-1}_{j=1}\ln(1-\dfrac{j}{n})\le -\dfrac{(r-1)r}{2n}$.

With both sides on exponent on $e$, then $e^{-\frac{(r-1)r}{2n}-\frac{r^3}{3n^2}}\le \prod\limits^{r-1}_{j=1}(1-\dfrac{j}{n})\le e^{-\frac{(r-1)r}{2n}}$. Let $\lambda = \dfrac{r^2}{2n}$, we have $e^{-\lambda}e^{\frac{c_1}{\sqrt n}}\le \prod\limits^{r-1}_{j=1}(1-\dfrac{j}{n}) \le e^{-\lambda} e^{\frac{c_2}{\sqrt{n}}}$, where $c_1 = \sqrt{\dfrac{\lambda}{2}} - \dfrac{(2\lambda)^{\frac{3}{2}}}{3}$ and $c_2=\sqrt{\dfrac{\lambda}{2}}$.

It is easy to see that $\lambda \le \dfrac{n}{8}$. Then $\prod\limits^{r-1}_{j=1} (1-\dfrac{j}{n}) \approx e^{-\lambda}$.

Consider 2 people from a group want to choose, causing a match, then the possibility may yields that $-\lambda = -\ln(2)$, or so to say $r \approx 1.1174 \sqrt n$. Thus $r$ has a bound in $O(\sqrt n)$.

Consider 2 group of people, then consider each object is not over-chosen in one group. The possibility of $i$ matches is $\Big(\dfrac{r^2}{n} \Big)^i \dfrac{e^{-\frac{r^2}{n}}}{i!}$. (unproved)

In this way, we can attack a hash function $h$ by attacking about $O(\sqrt n)$ random $x$ and find a collision, if possible.

## Improved Birthday Attack

We can get idea from Pollard’s rho by setting $x_i = h(x_{i-1})$ and $x_{2i} = (h \circ h)(x_{2(i-1)})$. A collision on $x_{i-1}$ and $h(x_{2(i-1)})$ is found as soon as $x_{i} = x_{2i}$.

This decreases the storage necessary by iterating over computation results.

# MD Construction

On designing hash functions, we have difficulty as:

• Infinite number of possible input.
• Finite number of possible output.

The conclusion says that: any hash function has an infinite number of collisions.

MD Construction is a method to convert a hash function on string of fixed length into a hash function accepting arbitrary input length.

If the original hash function is collision resistant, so is the constructed one.

## Definitions

Compression function is a function $g$ defined by $g:\lbrace 0,1\rbrace^{m+t}\longrightarrow \lbrace 0,1\rbrace ^m$, $t \ge 1$.

Iterated hash function is a hash function constructed by iteratively applying a compression function.

Let $x \parallel y$ be concatenation of $x$ and $y$. Then we donate $x \parallel \dots \parallel x$ as $x^k$.

Let $a$ be an integer, then $(a)_2$ is its binary representation.

The MD construction has the form like

• Split $x$ into $k = \Big\lceil \dfrac{n}{t-1}\Big\rceil$ blocks
• Set $y_k = x_k \parallel 0^d$, $y_{k+1} = (d)_2$
• $z_1 = g(0^{m+1}\parallel y_1)$, $z_{i+1} = g(z_i \parallel 1 \parallel y_{i+1})$
• $h(x) = z_{k+1}$

## MD Theorem

Let $g$ be collision resistant compression function $\lbrace 0,1\rbrace^{m+t}\longrightarrow \lbrace 0,1\rbrace^m$, with $t\ge 2$. Then MD construction is a collision resistant hash function.

Suppose we have a collision on $h$, which means $x \neq x’$ and $h(x) = h(x’)$, we can prove that a collision on $g$ can also be efficiently found.

First, we note if $|x|\neq |x’|$, then 2 different values $d$ and $d’$, respectively. Suppose $k + 1$ and $k’ + 1$ donate the $y-$block for $x$ and $x’$.

1. Consider $x \ne x’$ with $|x| \not\equiv |x’|\pmod {(t-1)}$. Then $d\ne d’$ and $y_{k+1}\ne y_{k’+1}’$. We then have

\begin{aligned} g(z_k\parallel 1 \parallel y_{k+1}) &= z_{k+1} = h(x)\newline &=h(x’) =z_{k’+1}’\newline &=g(z_{k’}’\parallel 1\parallel y_{k’+1}’) \end{aligned}
which is a collision on $g$ since $y_{k+1}\ne y_{k’+1}’$.

2. Consider 2 cases

1. $|x| \equiv |x’| \pmod {(t-1)}$ with $k=k’$. This implies $y_{k+1} = y_{k’+1}’$, then we have
\begin{aligned} g(z_k\parallel 1 \parallel y_{k+1}) &= z_{k+1} = h(x)\newline &=h(x’) =z_{k+1}’\newline &=g(z_{k}’\parallel 1\parallel y_{k+1}’) \end{aligned}
If $z_k \ne z_k’$ then a collision is found, otherwise we repeat the process and find the collision point.

2. $|x| \equiv |x’| \pmod {(t-1)}$ with $k\ne k’$. WLOG assume $k’ > k$, then proceed as previous 2.1 solution.

If no collision is found before $k=1$ then consider before $k=1$,

By construction, $m+1^{st}$ bit on left hand side is 0 while it is 1 on right hand side. Collision found.

\begin{aligned} g(0^{m+1}\parallel y_1) &= z_1\newline &= z_{k’-k+1}’\newline &=g(z_{k’-k}’\parallel 1\parallel y_{k’-k+1}’) \end{aligned}

# Secure Hash Algorithm

Currently, SHA-0/1 are broken. SHA-3 is out for safety.

If message is $x$, we append 1 to $x$, then append $0^d$ with $1 + |x| \equiv -64 \pmod {512}$, then append $|x|$ in base 2 over 64 bits.

In this way, the padded value $y$ is of form $x \parallel 1 \parallel 0^d\parallel |x|_{2,64}$. By construction $|y|\equiv 0\pmod {512}$. Break $y$ into $k = \Big\lfloor \dfrac{|x|}{512}\Big\rfloor + 1$ blocks.

## SHA-1 Algorithm

\begin{aligned} &\text{Input: }x\text{ a bit string}\newline &\text{Output: }h(x)\text{, where }h\text{ is SHA-1}\newline &H_0\leftarrow 67452301; H_1\leftarrow EFCDAB89; H_2\leftarrow 98BADCFE;\newline &H_3\leftarrow 10325476; H_4\leftarrow C3D2E1F0;\newline &d \leftarrow (447 - |x|) \pmod {512};\newline &y \leftarrow (x \parallel 1 \parallel 0^d\parallel (|x|)_2);\newline &\text{for }i\leftarrow 1 \text{ to }k\text{ do}\newline &\phantom{“”””}H_0,H_1,H_2,H_3,H_4\leftarrow \text{compress}(H_0, H_1,H_2,H_3,H_4,y_i)\newline &\text{end for}\newline &\text{return } H_0\parallel H_1\parallel H_2\parallel H_3\parallel H_4 \end{aligned}

## Compression Function

The functions $f_0,\dots,f_{79}$ are defined by
$$f_{i}(B,C,D) = \begin{cases} (B \wedge C) \vee (\not B\wedge D) & 0 \le i \le 19\newline B\oplus C \oplus D & 20 \le i \le 39\newline (B \wedge C )\vee (B \wedge D)\vee (C \wedge D) & 40 \le i \le 59\newline B \oplus C \oplus D & 60 \le i \le 79 \end{cases}$$
The constants $K_0,\dots,K_{79}$ are defined by
$$K_i = \begin{cases} 5A827999 & 0 \le i \le 19\newline 6ED9EBA1 & 20 \le i \le 39\newline 8F1BBCDC & 40 \le i \le 59\newline CA62C1D6 & 60 \le i \le 79 \end{cases}$$
The algorithm goes like
\begin{aligned} &\text{Input: 32-bit values }H_0,\dots,H_4, \text{512-bit block }y\newline &\text{Output: 32-bit values }H_0,\dots, H_4\newline &\text{Function compress}(H_0,\dots,H_4):\newline &\phantom{“”””}\text{split }y\text{ into }16\text{ words }W_0,\dots,W_{15};\newline &\phantom{“”””}\text{for }i\leftarrow 16\text{ to }79\text{ do}\newline &\phantom{“”””}\phantom{“”””}W_i\leftarrow ROTL(W_{i-3}\oplus W_{i-8}\oplus W_{i-14}\oplus W_{i-16})\newline &\phantom{“”””}\text{end for}\newline &\phantom{“”””}A\leftarrow H_0;B \leftarrow H_1;\dots E \leftarrow H_4;\newline &\phantom{“”””}\text{for }i\leftarrow 0\text{ to }79\text{ do}\newline &\phantom{“”””}\phantom{“”””}T \leftarrow ROTL^5(A) + f_i(B,C,D)+E + W_i + K_i;\newline &\phantom{“”””}\phantom{“”””}E \leftarrow D;D \leftarrow C;\newline &\phantom{“”””}\phantom{“”””}C \leftarrow ROTL^{30}(B);\newline &\phantom{“”””}\phantom{“”””}B \leftarrow A;A\leftarrow T;\newline &\phantom{“”””}\text{end for}\newline &\phantom{“”””}H_0\leftarrow H_0 + A; H_1 \leftarrow H_1 + B;\dots H_4 \leftarrow H_4 + E;\newline &\phantom{“”””}\text{return }H_0,\dots,H_4\newline &\text{end} \end{aligned}