Someone's Intermediate Representation

0%

All info comes from David Wu’s Lecture and Boneh-Shoup Book.

This note will be focusing mainly on perfect security, semantics security and PRG (Pseudo Random Generator).

The overall goal of cryptography is to secure communication over untrusted network. Two things must be achieved:

• Confidentiality: No one can eavesdrop the communication
• Integrity: No one can tamper with communication

# Perfect Security

A cipher $(Enc, Dec)$ satisfies perfect secure if $\forall m_0, m_1 \in M$ and $\forall c\in C$, $\Pr[k\overset{R}{\longleftarrow} K: Enc(k, m_0) = c] = \Pr[k\overset{R}{\longleftarrow} K:Enc(k,m_1) = c]$.

$k$ in two $\Pr$ might mean different $k$, the $\Pr$ just indicate the possibility of $\dfrac{\text{number of }k\text{ that }Enc(k, m) = c}{|K|}$.

## OTP is Perfect Secure

For every fixed $m = \lbrace 0, 1\rbrace^n$ there is $k, c = \lbrace 0, 1\rbrace^n$ uniquely paired that $m \oplus k = c$.

Considering perfect security definition, only one $k$ can encrypt $m$ to $c$. Thus $\Pr = \dfrac{1}{|K|} = \dfrac{1}{2^n}$ and equation is satisfied.

If a cipher is perfect secure, then $|K| \ge |M|$.

Assume $|K| < |M|$, we want to show it is not perfect secure. Let $k_0 \in K$ and $m_0 \in M$, then $c \leftarrow Enc(k_0, m_0)$. Let $S = \lbrace Dec(k, c): k \in K\rbrace$, we can see $|S| \le |K| < |M|$.

We can see that $\Pr\lbrack k \overset{R}{\longleftarrow} K: Enc(k, m_0) = c\rbrack > 0$, if we choose $m_1 \in M \backslash S$, then $\not\exists k \in K: Enc(k, m_1) = c$. Thus it is not perfect secure. $\square$

Isomorphism or invertible map will be discussed in this chapter.

## Isomorphism as Invertible Map

The crucial part of such map property is that an inverse map exists that $f: A \longrightarrow B$ has a $g$ inverse that satisfies $g \circ f \equiv 1_A$ and $f \circ g \equiv 1_B$.

A similarity between two collections can be given by choosing a map, which maps each element from the first one to the second one.

Sets, maps and the map composition will be talked about in this note.

If we ignore the value of conditions by treating if or while statements as nondeterministic choices between two branches, we call these analyses as path insensitive analysis.

Path sensitive analysis is used to increase pessimistic accuracy of path insensitive analysis.

Interval analysis can be used in integer representation or array bound check. This would involve widening and narrowing.

A lattice of intervals is defined as $\text{Interval} \triangleq \text{lift}(\lbrace \lbrack l,h\rbrack \mid l,h\in\mathbb{Z} \wedge l \leq h\rbrace)$. The partial order is defined as $\lbrack l_1,h_1\rbrack \sqsubseteq \lbrack l_2,h_2\rbrack \iff l_2 \leq l_1 \wedge h_1 \le h_2$.

The top is defined to be $\lbrack -\infty,+\infty\rbrack$ and the bottom is defined as $\bot$, which means no integer. Since the chain of partial order can have infinite length, the lattice itself has infinite height.

The total lattice for a program point is $L = \text{Vars}\to \text{Interval}$, which provides bounds for each integer value.

The constraint rules are listed as
\begin{aligned} & & & JOIN(v) = \bigsqcup_{w\in pred(v)}\lbrack \lbrack w\rbrack \rbrack \newline \lbrack\lbrack X = E\rbrack\rbrack &\phantom{:::} \lbrack\lbrack v\rbrack\rbrack &=& JOIN(v) \lbrack X \mapsto \text{eval}(JOIN(v), E)\rbrack\newline & & & \text{eval}(\sigma, X) = \sigma(X)\newline & & & \text{eval}(\sigma, I) = \lbrack I, I\rbrack\newline & & & \text{eval}(\sigma, \text{input}) = \lbrack -\infty, \infty\rbrack\newline & & & \text{eval}(\sigma, E_1\ op\ E_2) = \hat{op}(\text{eval}(\sigma, E_1), \text{eval}(\sigma, E_2))\newline & & & \hat{op}(\lbrack l_1,r_1\rbrack, \lbrack l_2,r_2\rbrack) = \lbrack \min_{x\in \lbrack l_1,r_1\rbrack, y\in \lbrack l_2,r_2\rbrack} x\ op\ y, \max_{x\in \lbrack l_1,r_1\rbrack, y\in \lbrack l_2,r_2\rbrack }x\ op\ y\rbrack \newline & \phantom{:::}\lbrack\lbrack v\rbrack\rbrack &=& JOIN(v)\newline & \lbrack \lbrack exit\rbrack\rbrack &=& \varnothing \end{aligned}
The fixed-point problem we previously discussed is only restricted in lattice with finite height. New fixed-point algorithm is needed in practical space.

More flow sensitive analysis (data flow analysis) on the way

Forward Analysis Backward Analysis
May Analysis Reaching Definition Liveness
Must Analysis Available Expressions Very Busy Expressions

This part will take notes about Lattice Theory.

Appetizer: Sign analysis can be done by first construct a lattice with elements $\lbrace +,-,0,\top,\bot\rbrace$ with each parts’ meaning:

• $+, -, 0$ stand for integer value signs
• $\top$ stands for any integer values while $\bot$ means empty set of integer values.

$$\begin{array}{ccccc} & & \top& & \newline & \swarrow & \downarrow & \searrow \newline & + & 0 &- \newline & \searrow & \downarrow & \swarrow \newline & &\bot \end{array}$$

All info comes from Manuel’s slides on Lecture 4.

# Intros on Hash Functions

What if we want some high-level of fingerprint that can be built from any data, with a tiny change in data cause radical impact on the whole fingerprint.

All info comes from Manuel’s slides on Lecture 3.

# Public-Key Cryptosystem

Encryption depends on a public key $K$ and decryption depends on a secret key $K’$.

Finding $K’$ when knowing $K$ is computationally infeasible.

All info comes from Manuel’s slides on Lecture 2.

# Block Cipher

A block cipher is composed of 2 co-inverse functions:
\begin{aligned} E:\lbrace 0,1\rbrace^n\times\lbrace0,1\rbrace^k&\to\lbrace 0,1\rbrace^n& D:\lbrace 0,1\rbrace^n\times\lbrace0,1\rbrace^k&\to\lbrace 0,1\rbrace^n\\ (P,K)&\mapsto C&(C,K)&\mapsto P \end{aligned}
where $n,k$ means the size of a block and key respectively.

The goal is that given a key $K$ and design an invertible function $E$ whose output cannot be distinguished from a random permutation over $\lbrace 0, 1\rbrace^n$.

All info comes from Manuel’s slides on Lecture 1.

# Intros on System FPC and Recursive Types

FPC is a language with products, sums, partial functions, and recursive types.

Recursive types are solutions to type equations $t\cong\tau$ where there is no restriction on $t$ occurrence in $\tau$. Equivalently, it is a fixed point up to isomorphism of associated unrestricted type operator $t.\tau$. When removing the restriction on type operator, we may see the solution satisfies $t\cong t\rightharpoonup t$, which describes a type is isomorphic to the type of partial function defined on itself.

Types are not sets: they classify computable functions not arbitrary functions. With types we may solve such type equations. The penalty is that we must admit non-termination. For one thing, type equations involving functions have solutions only if the functions involved are partial.

A benefit of working in the setting of partial functions is that type operations have unique solutions (up to isomorphism). But what about the inductive/coinductive type as solution to same type equation? This turns out that based on fixed dynamics, be it lazy or eager:

• Under a lazy dynamics, recursive types have a coinductive flavor, and inductive analogs are inaccessible.
• Under an eager dynamics, recursive types have an inductive flavor, but coinductive analogs are accessible as well.

# Intros on General Recursion and Fixpoint

System T is introduced as a basis for discussing total computations, those for which the type systems guarantees termination.

Language M generalizes T to admit inductive and coinductive types, while preserving totality.

PCF will be introduced here as a basis for discussing partial computations, those may not terminate while evaluated, even if they are well typed.

Seems like a disadvantage, it admits greater expressive power than is possible in T.

The source of partiality in PCF is the concept of general recursion, permitting solution of equations between expressions. We can see the advantages and disadvantages clearly that:

• The price of admitting solutions to all such equations is that the computation may not terminate: the solution of some equations may be divergent or undefined.
• The advantage is that termination proof need not be embedded into the code, resulting in short code.

A solution to such a system of equations is a fixed point of an associated functional (higher-order function).

This part is not shown on PFPL preview version.

# Intros and Examples

Concept of type quantification leads to consideration of quantification over type constructors, which are functions mapping types to types.

For example, take queue here, the abstract type constructor can be expressed by existential type $\exists q::\text{T}\to\text{T}.\sigma$, where $\sigma$ is labeled tuple type and existential type $q\lbrack t\rbrack$ quantifies over kind $\text{T}\to\text{T}$.
$$\begin{matrix} \langle&&\text{emp}&&\hookrightarrow&&\forall t::\text{T}.t,&&\\ &&\text{ins}&&\hookrightarrow&&\forall t::\text{T}.t\times q\lbrack t\rbrack\to q\lbrack t\rbrack,&&\\ &&\text{rem}&&\hookrightarrow&&\forall t::\text{T}.q\lbrack t\rbrack\to(t\times q\lbrack t\rbrack)\space\text{opt}&&\rangle \end{matrix}$$
Language $\text{F}_\omega​$ enriches language F with universal and existential quantification over kinds. This extension accounts for definitional equality of constructors.

# Data Abstraction and Existential Types

Data abstraction has great importance for structuring programs. The main idea is to introduce an interface that serves as a contract between the client and implementor of an abstract type. The interface specifies

• what client relies on to continue its work.
• what implementor must provide to satisfy the contract.

The interface serves to isolate the client from implementor so that they can be developed isolated. This property is called representation independence for abstract type.

Existential types are introduced in extending language F in formalizing data abstraction.

• Interfaces are existential types that provide a collection of operations on unspecified, or abstract types.
• Implementations are packages, introduction forms of existential types.
• Clients are uses of corresponding elimination form.

# From Monomorphic to Polymorphic

So far we are dealing with monomorphic languages since every expression has a unique type. If we want a same behavior over different types, different programs will have to be created.

The expression patterns codify generic (type-independent) behaviors that are shared by all instances of the pattern. Such generic expressions are polymorphic.

This part is not shown on PFPL preview version.

# Some Forms of Recursive Type

Inductive and coinductive types are two important forms of recursive type.

On inductive type, elements are intuitively those given by finite composition of its introduction forms.

If we specify behavior of a function on each of introduction forms of an inductive type, then the behavior is defined for all values of the type.

Such a function is a recursor or catamorphism.

It is actually generalized fold.

Elements of a coinductive type are those behave properly in response to a finite composition of its elimination forms.

If we specify the behavior of an element on each elimination forms, then we have fully specified the value of that type.

Such an element is a generator or anamorphism.

It is actually generalized unfold.

This part is not shown on PFPL preview version.

# Introduction

Many programs are instances of a pattern in a particular situation. Sometimes types determine the pattern by a technique called (typed) generic programming.

Example: the natural number definition in Chapter 9 defines pattern of recursion on values of an inductive type, which is expressed as a generic program.

Consider a function $f:\rho\to\rho’$, we wish to extend $f$ to a transformation from type $\lbrack \rho/t\rbrack\tau$ to $\lbrack \rho/t\rbrack\tau$ by applying $f$ to various spots in the inputs, where $\rho$ occurs to obtain a value of type $\rho’$, leaving the rest of data structure alone.

$\tau$ can be $\text{nat}\times t$ and $f$ can be extended to $\text{nat}\times\rho\to\text{nat}\times\rho’$ that transform $\langle a,b\rangle$ to $\langle a,f(b)\rangle$.

## Ambiguity and Type Operator

Ambiguity arises in many-one nature of substitution. A type can have the form $\lbrack\rho/t\rbrack\tau$ in different ways, according to how $t$ occur in $\tau$.

The ambiguity is resolved by giving a template that marks the occurrences of $t$ in $\tau$ at which $f$ is applied.

Such template is known as type operator $t.\tau$, which is a abstractor binding type variable $t$ and type $\tau$.

The part is not shown on PFPL preview version.

# From Constructive Logic to Classical Logic

Constructive logic is a logic of positive evidence, as stated in last chapter.

In contrast, classical logic is a logic of perfect information where every proposition is either true or false.

This casts a “god’s view” of the world, where there are no open problems, all propositions converge to a result. The symmetry between truth and falsity is appealing, but the logical connectives are weaker in classical case.

In Exercise 12.1 for LEM, LEM is not universally valid. It is valid only under classical case.

Constructive logic is stronger (more expressive) than classical logic since it can express more distinctions (between affirmation and irrefutability) and it is consistent with classical logic.

A proof in classical logic is a computation that cannot be refuted.

Computationally, a refutation consists of a continuation, or control stack, that takes a proof of proposition and derives a contradiction. A proof in classical logic is a computation that, when given a refutation of that proposition derives a contradiction, witnessing the impossibility of refuting it.

This part is not shown on PFPL preview version.

# What is Constructive Logic

Constructive logic codifies the principles of mathematical reasoning as it is actually practiced.

In mathematics, a proposition is judged true exactly when it has a proof, and false exactly when refutation occurs. Since there’re always, and will always be, unsolved problems, we can’t expect in general a proposition be either true or false.

# From Product to Sum

The binary sum type offers a choice of two types, and the nullary sum offers a choice of no things.

Finite sums generalize binary and nullary sums to allow an arbitrary number of cases indexed by a finite index set.

As with products, sums comes in with both eager and lazy variants, differs in how values of sum types are defined.

# From Unit to Finite Product

Binary product of two types consists of ordered pairs of values, one from each type in order specified.

Nullary product or unit type consists solely of unique “null tuple” of no value.

Associated elimination form of binary product is projection, which select first/second of the pair. No associated elimination form exists for nullary product.

Product types admits both lazy and eager dynamics:

• In lazy dynamics, a pair is a value without regard to whether its components are values, they are not evaluated until (if ever) they are accessed and used in another computation.
• In eager dynamics, a pair is a value only if its components are values; they are evaluated when the pair is created.

Finite product with form $\langle \tau_i \rangle_{i\in I}$ is a more general form, indexed by a finite set of indices $I$.

The components are accessed by $I$-indexed projection operations, generalizing the binary case. Similar to binary product, finite products admit both an eager and a lazy interpretation.

# From E to T

System T is the combination of function types with the type of natural numbers. T provides a general mechanism called primitive recursion.

Primitive recursion captures essential inductive character of natural numbers, hence may be seen as intrinsic termination proof for each program in that language.

Consequently we may only define total functions in that language, those that always return a value for each argument.

The proof may seem like a shield against infinite loops, it is a weapon that can be used to show that some programs cannot be written in T.

This part is not shown on PFPL preview version.

# Functions as Language Extensions

A function is defined by binding a name to an abt with a bound variable serves as the argument.

A function is applied by substitute a particular expression of suitable type for a bound variable to get an expression.

First-order functions have domain and range limited to num and str.

Higher-order functions permit arguments and results to be other functions.

# Another Forms of Dynamics

An evaluation dynamics is a relation between a phase and its value that is defined without detailing step-by-step evaluation process with notation $e \Downarrow v​$.

Cost dynamics enriched evaluation dynamics with cost measure specifying resource usage for evaluation as $e \Downarrow ^k v​$.

# Intros

Type safety expresses the coherence between the statics and the dynamics. Statics may be seen as predicting that the value of an expression will have a certain form so that the dynamics of the expression will be well-defined. Thus, evaluation will not get stuck.

The theorem of type safety for E is defined as:

• (Preservation) If $e : \tau$ and $e \longmapsto e’$, then $e’: \tau$.
• (Progress) If $e : \tau$ and either $e \text{ val}$ or $\exists e’$ such that $e \longmapsto e’$.

Safety is the conjunction of preservation and progress.

$e \text{ stuck}$ iff $\neg e \text{ val}$ and $\not\exists e’$ such that $e \longmapsto e’$. Based on safety theorem, stuck state is necessarily ill-typed.

# Dynamics Keywords

Dynamics of a language describes how programs are executed.

• Structural Dynamics: Defines transition system inductively specifies step-by-step process of executing a program.
• Contextual Dynamics: A variation of structural dynamics.
• Equational Dynamics: A collection of rules defining when one program is definitionally equivalent to another.

# PL Phase Distinction

There are 2 phases defined in PLs:

• Static
• Parsing and Type Checking to ensure the program is well-formed.
• Dynamic
• Well-Behaved Execution of well-formed programs.

A language is said to be safe when well-formed and well-behaved execution are both achieved.

Static phase is specified by a statics comprising a set of rules for deriving type judgments stating that an expression is well-formed of a certain type.

Type safety tells us that the predictions about “some aspects of run-time safety by seeing that the types fit well in the program” is correct.

This note will use PFPL expression language E as example.