Vapnik-Chervonenkis density: from uniform convergence to Berkovich analytification and stably dominated types

This post is about the notion of Vapnik-Chervonenkis dimension — a concept that arose more-or-less simultaneously in probability theory, in model theory (the part of mathematical logic dealing with abstract structures), and in combinatorics. It is fascinating how this concept which just has to do with set systems (with no additional structure) ends up having connections with so many different areas of mathematics — including very recent developments in model theory due to Hrushovski and Loeser.

The goal of this post is to give readers (with diverse backgrounds) a glimpse of these connections/applications. I have skimped on many details obviously — the idea being to show just enough to whet the appetite of the reader to learn more about the topic.

Since the post is somewhat long I have organized it into sections and the reader might jump directly to the section most interesting to her. The sections are independent of each other (modulo the initial definitions and notation provided in Section 1). They are enumerated as follows.

  1. In Section 1 I give the basic definitions and discuss the famous “Dichotomy” lemma.
  2. In Section 2 I discuss applications to probability theory.
  3. In Section 3 I discuss applications to discrete and computational geometry.
  4. In Section 4 I discuss connections with model theory (assuming no background in logic).
  5. In Section 5 I relate VC density with the more geometric notion of {0/1}-patterns.
  6. In Section 6 I discuss the “cohomological method” for bounding the VC codensity of (or equivalently the number of {0/1}-patterns realized by) a family of subsets of a given set.
  7. In Section 7 I review briefly some history of the quantitative problem of obtaining tight upper bounds on the VC codensity of definable families in different structures.
  8. Finally, in Section 8 I discuss recent work with Deepam Patel on obtaining tight upper bounds on the VC density of definable families over a non-archimedean valued field.

Standard disclaimer: It goes without saying that the views expressed are mine and may not reflect those of my co-authors/co-conspirators. The LaTeX to WordPress conversion was done using the latex2wp package written by Luca Trevisan. Thanks Luca!

1. Origin and Definition

The Vapnik-Chervonenkis (henceforth VC) dimension (as well as density) of a set {\mathcal{C}} of subsets of a set {X} is a combinatorial notion which is motivated independently by several different areas of mathematics. Its origin can be traced to probability theory (Vapnik and Chervonenkis), model theory (Shelah), and combinatorics (Sauer) — all around the same time.

For any subset {D} of {X}, we denote

\displaystyle S(D;\mathcal{C}) = \{D \cap C \mid C \in \mathcal{C}\}.

We say that a (finite) subset {D} of {X}, is shattered by {\mathcal{C}}, if

\displaystyle S(D;\mathcal{C}) = 2^D.

The VC dimension, {\mathrm{vcdim}(\mathcal{C})}, is the largest {n}, such that there exists a subset {D \subset X} with {\mathrm{card}(D)=n} which is shattered by {\mathcal{C}}. If no such {n} exists, we set {\mathrm{vcdim}(\mathcal{C}) = \infty}.

The definition of VC definition is best motivated by the following lemma due (independently) to Sauer and Shelah.

Sauer, N. (1972), “On the density of families of sets”, Journal of Combinatorial Theory, Series A, 13: 145-147, doi:10.1016/0097-3165(72)90019-2.

Shelah, Saharon (1972), “A combinatorial problem; stability and order for models and theories in infinitary languages”, Pacific Journal of Mathematics, 41: 247-261, doi:10.2140/pjm.1972.41.247.

In the statement of the lemma and later we will use the convenient notation

\displaystyle D \subset_n X

to denote

\displaystyle D \subset X \mbox{ and } \mathrm{card}(D) = n.

Lemma 1 (Dichotomy) Suppose that {\mathrm{vcdim}(\mathcal{C}) = d < \infty}. Then for all {D \subset_n X}, with {n > d},

\displaystyle \mathrm{card}(S(D,\mathcal{C})) \leq \sum_{0 \leq i \leq d} \binom{n}{i}.

In particular, there exists {c > 0}, such that for all {n>0},

\displaystyle \mathrm{card}(S(D,\mathcal{C})) \leq c \cdot n^d.

 

The name “Dichotomy” is justified by the following observation.

We denote

\displaystyle \pi_{\mathcal{C}} (n) = \sup_{D \subset_n X} \mathrm{card}(S(D;\mathcal{C})). \ \ \ \ \ (1)

Lemma 1 implies that exactly one of the following two alternatives is true.

  1. {\pi_{\mathcal{C}}(n) = 2^n \mbox{ for all } n};
  2. there exists {d > 0, c > 0} such that {\pi_{\mathcal{C}}(n) \leq c \cdot n^d}.

The VC density, denoted {\mathrm{vcdensity}(\mathcal{C})}, of {\mathcal{C}} is defined by

\displaystyle \mathrm{vcdensity}(\mathcal{C}) = \limsup_{n} \frac{ \log(\pi_\mathcal{C}(n))}{\log(n)}.

It is an immediate consequence of Lemma 1 that:

\displaystyle \mathrm{vcdensity}(\mathcal{C}) \leq \mathrm{vcdim}(\mathcal{C}).

The above inequality can be strict. However, it follows from Lemma 1 that that

\displaystyle \mathrm{vcdim}(\mathcal{C}) < \infty \Leftrightarrow \mathrm{vcdensity}(\mathcal{C}) < \infty.

Later on in this post we will focus mostly on VC density rather than on VC dimension, since VC density can be related to more geometric notions (such as {0/1}-patterns to be defined later).

2. Application in probability theory

The original motivation that led Vapnik and Chevonenkis to define VC dimension came from probability theory — namely, to provide necessary and sufficient condition for uniform law of large numbers to hold in certain situation. We describe this briefly below.

Let {X} be a sample space on which a probability measure is defined, and {\mathcal{C}} be a collection of measurable subsets of {A}. If {x_1,\ldots,x_n} are samples drawn independently from {X} with the given probability measure, then for every fixed {C \in \mathcal{C}}, the random variable

\displaystyle \left|\frac{\mathrm{card}(\{x_1,\ldots,x_n\} \cap C)}{n} - \mathbb{P}(C)\right|

goes to {0} in probability as {n \rightarrow \infty}. This is just the law of large numbers.

However, if we want the convergence to zero to be uniform i.e. we want the random variable

\displaystyle \sup_{C \in \mathcal{C}} \left(\left|\frac{\mathrm{card}(\{x_1,\ldots,x_n\} \cap C)}{n} - \mathbb{P}(C)\right|\right)

to go to {0} in probability as {n \rightarrow \infty}, then we need to impose further conditions (in the way we have stated this it is not clear that that the function on the left hand side is measurable, but let us assume that it is. It will be measurable if we should assume that {\mathcal{C}} is countable for instance).

Vapnik and Chervonenkis proved in their paper:

Vapnik, V. N; Chervonenkis, A. Ya. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities, Theory of Probability and its Applications; Philadelphia Vol. 16, Iss. 2, (1971): 17. DOI:10.1137/1116025

that a suffcient condition for the uniform convergence is

\displaystyle \mathrm{vcdim}(\mathcal{C}) < \infty.

More precisely they proved (Theorem 2, loc. cit.) that for every {\varepsilon > 0}\displaystyle \mathbb{P}\left( \sup_{C \in \mathcal{C}} (\left\vert\frac{\mathrm{card}(\{x_1,\ldots,x_n\} \cap C)}{n} - \mathbb{P}(C)\right\vert\right) > \varepsilon) < 4 \cdot \pi_{\mathcal{C}}(2 n) \cdot \exp(-\varepsilon^2 n/8).

In particular, if {\mathrm{vcdim}(\mathcal{C})} is finite, and hence {\pi_{\mathcal{C}}(n)} (cf. (1)) is polynomially bounded in {n} by Lemma 1, then we have uniform convergence to {0} in probability.

3. Application in discrete and computational geometry

VC dimension has played an extremely important role in discrete and computational geometry — in particular, in the algorithmic problem often referred to as “range searching”.

Reusing terminology from before let {X} be a set and {\mathcal{C} \subset 2^X}. In the context of computational geometry, {X} is often a finite dimensional real vector space, and elements of {\mathcal{C}} are often a class of semi-algebraic sets having “constant description complexity” (for example, {X = \mathbb{R}^2}, and {\mathcal{C}} the set of all unit disks, or half spaces, or triangles etc.). Elements of {\mathcal{C}} are referred to as “range” sets. The range searching problem of takes as input given a finite {S} set of {n} points in {X}. The goal is to “preprocess” them, so that given a query “range” {C \in \mathcal{C}}, one can report quickly properties of the set {S\cap C}: for example, the cardinality of {S \cap C}, or whether {S \cap C} is empty or not. One can find details in the survey article:

Agarwal, Pankaj K. “Simplex range searching and its variants: a review” A journey through discrete mathematics, 1–30, Springer, Cham, 2017.

A key tool in efficient solutions proposed for the range searching problem is the notion of of {\varepsilon}-nets (and the related notion of {\varepsilon}-approximations). The idea is to represent the given set {S} with a very small sized “representative” subset in {X} which is guaranteed to have good properties with respect to the range sets in {\mathcal{C}}.

Definition 2 Given a pair {(X,\mathcal{C})}, and {\varepsilon, 0 \leq \varepsilon \leq 1}, and {S} a finite subset of {X}. A subset {Y \subset X} is called an {\varepsilon}-net (resp. {\varepsilon}-approximation) for {S} if {Y \cap C \neq \emptyset} for all {C \in \mathcal{C}} with {\mathrm{card}(S \cap C) \geq \varepsilon \mathrm{card}(S)} (resp. if {\left| \frac{\mathrm{card}(Y \cap C)}{\mathrm{card}(Y)} - \frac{\mathrm{card}(S \cap C)}{\mathrm{card}(S)} \right| < \varepsilon} for all {C \in \mathcal{C}}).

The important result is (which we state in terms of finite sets or equivalently for atomic measures than for more general measure spaces) is the following. The VC dimension plays of the range {\mathcal{C}} of course plays a critical role in it.

Theorem 3 If {\mathrm{vcdim}(\mathcal{C}) = d < \infty}, then there exists {c > 0}, such that for every {\varepsilon \in (0,1)}, and finite subset {S} of {X}, there exists an {\varepsilon}-net (resp. an {\varepsilon}-approximation) for {S} of cardinality at most {c \cdot \frac{d}{\varepsilon}\log \frac{1}{\varepsilon}} (resp. {c \cdot \frac{d}{\varepsilon^2}\log \frac{1}{\varepsilon}}).

Haussler, David; Welzl, Emo {\varepsilon}-nets and simplex range queries. Discrete Comput. Geom. 2 (1987), no. 2, 127-151.

Komlos, Janos; Pach, Janos; Woeginger, Gerhard Almost tight bounds for {\varepsilon}-nets. Discrete Comput. Geom. 7 (1992), no. 2, 163-173.

The key point of course is the fact that the cardinality of the {\varepsilon}-net (resp. {\varepsilon}-approximations) can be bounded independent of the cardinality of {S}. Thus, with respect to the “range space” {(X,\mathcal{C})}, any finite set {S} can be replaced by an “approximation” (i.e. an {\varepsilon}-net or an {\varepsilon}-approximation) of size that depends on {\varepsilon} and the pair {(X,\mathcal{C})} but independent of the cardinality of {S}. This is what makes the above theorem key to many results in computational geometry. Theorem 3 also has an effective (algorithmic) variant. In turns out that there is a highly efficient randomized algorithm that can compute an {\varepsilon}-net with high probability — and together these lead to efficient solutions to the range searching problem.

4. Application in model theory

In model theory the notion of VC-density is connected with the so called “independence property” of formulas and models, which in turn is related to the property of of stability which is of fundamental importance in model theory.

Mathematical logic and model theory has its own set of notation and definitions — some of which may not be completely familiar to non-logicians. In the following paragraphs I condense in half-a-page what usually takes a couple of lectures and several pages worth of writing to write “properly” — hopefully with no loss of content. I assume familiarity with the definition of a “first-order formula” in some language but not much more.

Given a language {L} (i.e. a tuple of symbols representing a set, constants, functions, and relations) an {L}-structure {\mathcal{M}} is a tuple consisting of a set, {M} along with a set of constants, functions and relations on {M} interpreting the corresponding symbols of {L}. So if {L = (0,1,+,\cdot)} (also referred to as the language of rings with a unit element), then the tuple {(\mathbb{Z}, 0,1,+,\cdot)} with the usual interpretations of the symbols “{0}”, “{1}”, “{+}”, “{\cdot}” is an {L}-structure.

If {L = (0,1,+,\cdot)}, then is

\displaystyle \forall y \exists z (y\cdot y + z\cdot z = x)

is an example of an {L}-formula having {\{x\}} as its set of free variables. An {L}-formula with an empty set of free variables is an {L}-sentence. (The reader can now come up with the proper definitions of {L}-formulas and sentences.)

A theory {T} is a set of {L}-sentences. An {L}-structure {\mathcal{M}} is a model of {T} (denoted {\mathcal{M} \models T}) if every sentence in {T} is “true” in the structure {\mathcal{M}} (the reader can supply what it means to be be “true” in {\mathcal{M}}). A theory is called consistent if it admits a model. A theory {T} is complete if every for every {L}-sentence {\phi}, either {\mathcal{M} \models \phi} for every model {\mathcal{M}} of {T}, or {\mathcal{M} \models \neg \phi} for every model {\mathcal{M}} of {T}.

Quantifier elimination

Notice that a formula can have quantifiers. If every formula is equivalent {\mod T} to a formula without quantifiers then we say that the theory {T} admits quantifier elimination. (Two formulas {\phi(x),\psi(x)} are equivalent {\mod T} iff {T \vdash \phi \leftrightarrow \psi} or equivalently iff {\mathcal{M} \models \forall x \phi(x) \leftrightarrow \psi(x)} for every model {\mathcal{M}} of {T}.) It is a classical result in algebraic geometry (often referred to in non-model theory context as the Lefschetz principle) that the theory of algebraically closed fields admits quantifier elimination. The theory of real closed ordered fields admits quantifier elimination (in the language of {(0,1,+,\cdot, <)} of ordered fields) — this was proved by Tarski. Th property of quantifier elimination depends on the language. So the theory of real closed fields (without the order relation) does not admit quantifier elimination. Why ?

Types

One notion that we will not discuss much, but which plays a key role in the application that we describe in Section 8, is that of types. Given an {L}-structure {\mathcal{M} = (M,\cdots)}, a subset {A \subset M}, and {n>0}, an {n}-type {p} over {A} is a consistent set of {n}-ary {L}-formulas with parameters in {A} (this is the same as saying that there exists an elementary extension {\mathcal{N}} of {\mathcal{M}} and {a \in \mathcal{N}} such that {\mathcal{N} \models \phi(a)} for every {\phi \in p}.) An {n}-type over {A} is called complete if for every {n}-ary formula {\phi} with parameters in {A}, either {\phi \in p} or {\neg \phi \in p}. The set of {n}-types over {A} is denoted {S_n(A)}. The type space {S_n(A)} carry a natural topology referred to as the Stone topology.

Types in model theory are a device for producing a topological space out of (a model of) a theory {T}, analogous to how algebraic geometers produce the topological space {\mathrm{Spec}(A)} (equipped with the Zariski topology) out of a commutative ring {A}. In fact it is more than an abstract analogy. If {K} is a model of the theory of algebraically closed fields, then the space of complete {n}-types, {S_n(K)}, is indeed in bijection with {\mathrm{Spec}(K[X_1,\ldots,X_n])} (it is a nice do-able exercise to come up with this bijection just from what has been said above). Moreover, the natural bijective map {S_n(K) \rightarrow \mathrm{Spec}(K[X_1,\ldots,X_n])} is continuous (with the Stone topology on the left and the Zariski topology on the right). Its inverse though is not continuous (the Stone topology is much finer than the Zariski topology).

Let us now fix a language {L}, and a {L}-theory {T}, and let {\mathcal{M}= (M,\cdots) } be a model of {T}. A subset {X} of {M^n} is said to be definable if there exists a {L}-formula {\phi(x)} with {|x| = n}, such that {X = \{a \in M^n \mid \mathcal{M}\models \phi(a)\}} (in model theory lingo it is usual to write in this case {X = \phi(\mathcal{M})}). We also say {X} is definable with parameters if there exists a formula {\phi(x,y)}, such that there exists {b \in M^{|b|}} such that {X = \phi(\mathcal{M},b)}. The set of subsets {\{\phi(\mathcal{M},b) \mid b \in M^{|y|} \}} is a definable family of subsets of {M^{|x|}}.

(One final piece of warning about notational abuse. Model theory textbooks often drop the exponents in formulas like {a \in M^{|a|}} and just write {a \in \mathcal{M}} even when {|a| > 1}. This is harmless and does not cause any ambiguity.)

O-minimal structures

One class of structures that is playing an increasing important role in applications of model theory to other areas of mathematics (Diophantine geometry, Hodge theory amongst others), and which also plays an important role in the “VC story” to which this post is devoted, is the class of o-minimal structures.

Let {L=(<,\cdots) } be a language having a binary relation symbol {<}. Then, an {L}-structure {\mathcal{M} = (M,<,\cdots)} is said to be o-minimal if the reduct {(M,<)} is a model of the theory of dense linear order, and the set of definable subsets (with parameters) of {M^1=M} is the smallest possible — namely, finite unions of points and intervals (such sets are clearly definable with parameters in the ordered structure {(M,<)}). This explains the name “order-minimal” or o-minimal. The most interesting examples of o-minimal structures are those that expand the o-minimal structure {(\mathrm{R},<,+,\cdot)} where {\mathrm{R}} is a real closed field. Gabrielov proved that there exist strict expansions of {(\mathbb{R},<,+,\cdot)} (as his undergraduate thesis work in Moscow State University long before the name “o-minimal” was invented!), and many other structures have been shown to be o-minimal since then. The terminology of “o-minimality” was introduced by Pillay and Steinhorn.

Gabrielov, A. M. Projections of semianalytic sets. (Russian) Funkcional. Anal. i Priložen. 2 1968 no. 4, 18-30.

Pillay, Anand; Steinhorn, Charles Definable sets in ordered structures. Bull. Amer. Math. Soc. (N.S.) 11 (1984), no. 1, 159-162.

The definable sets in an o-minimal structure share many of the tameness properties of real semi-algebraic sets (uniform finiteness in families of topological invariants such as the number of connected components or the total Betti number, local conical structure etc.). As such they provide a natural setting for studying geometric problems without having to worry about wild oscillatory functions, for example {\sin(1/x)} over {x > 0}.

A standard source book for o-minimal structures is:

van den Dries, Lou. Tame topology and o-minimal structures. London Mathematical Society Lecture Note Series, 248. Cambridge University Press, Cambridge, 1998. x+180 pp. ISBN: 0-521-59838-9

Stability and independence

We now discuss the (related) notions of stability and independence of theories. We start with the former since stability is the more important of the two notions (but we will soon see that it is independence that has to do with VC theory).

Let {\phi(x,y)} be a {L}-formula (here {x,y} are tuples of free variables). The formula {\phi} is said to be stable if there does NOT exist sequences {(a_i)_{i \in \omega}, (b_i)_{i \in \omega}} in some model of {\mathcal{M}} of {T}, satisfying {\mathcal{M} \models \phi(a_i,b_j) } if and only if {i < j}. The theory {T} is said to be stable if every {\mathcal{L}}-formula {\phi(x,y)} is stable. In other words, {T} is stable if it is not possible to order infinite sequences using a formula (modulo {T}) in the above sense. The theory of real closed fields is not stable — take {\phi(x,y) := \exists z \neg(z =0 ) \wedge (y = x + z^2)} (which expresses {x < y}), and let {a_i = b_i = i \in \mathbb{R}}. On the other hand the theory of algebraically closed fields is stable. The stability property of theories has several equivalent definitions. For example, stability can be defined in terms of cardinalities of type spaces (“stable theories have few types”). These different definitions seem to have nothing to do with each other on the surface (the hallmark of most important notions in mathematics).

We next define the notion of independence. We say that a formula {\phi(x,y)} has the independence property if there exists a sequence {(a_i)_{i \in \omega}} in some model of {\mathcal{M}} of {T}, such that for all {\tau \subset \omega}, the set of formulas {\{\phi(x,a_i) \mid i \in \tau\} \cup \{\neg \phi(x,a_i) \mid i \not\in \tau\}} is consistent. A theory is said to “not have the independence property” (NIP for short) if no {\mathcal{L}}-formula {\phi(x,y)} has the independence property. If a theory has the independence property then the theory cannot be stable (Exercise !). However, the converse is not true — NIP theories need not be stable. For example, the theory of real closed fields is NIP but not stable. The following theorem characterizes the gap between being NIP and stable. We say that a formula {\phi(x)} has the strict order property if there exists a sequence {(a_i)_{i \in \omega}} in some model {\mathcal{M}} of {T}, such that {\mathcal{M} \models \phi(x,a_i) \rightarrow \phi(x,a_j)} if and only if {i < j}.

Theorem 4 (Shelah) A theory is not stable if and only if there exists a formula {\phi(x,y)} having the independence property or the strict order property.

A good source for NIP theories is the book:

Simon, Pierre A guide to NIP theories. Lecture Notes in Logic, 44. Association for Symbolic Logic, Chicago, IL; Cambridge Scientific Publishers, Cambridge, 2015. vii+156 pp. ISBN: 978-1-107-05775-3

NIP and finiteness of VC density

Finally, coming back to VC-density — it is an easy exercise to see that if {\mathcal{M}} is a model of {T}, and {\phi(x,y)} a {L}-formula — then {\phi(x,y)} not having the independence property is equivalent to the statement that the definable family {\{\phi(a,\mathcal{M}) \mid a \in \mathcal{M}\}} of subsets of {M^{|b|}} has finite VC-density. In particular, any definable family of sets in an NIP structure has finite VC dimension/density.

5. VC-density and {0/1}-patterns

For any set {V} and a finite set of subsets {\mathcal{C} = \{V_1,\ldots,V_n\}} of {V}, the set of {0/1} -patterns realized by {\mathcal{C}} is the subset {\Sigma(V,\mathcal{C})} of {\{0,1\}^{[1,n]}} defined by

\displaystyle \Sigma(V,\mathcal{C}) = \{\sigma \in \{0,1\}^{[1,n]} \mid \exists v \in V \mbox{ such that } \sigma(i) = \chi_{V_i}(v) \},

where {\chi_{V_i}(\cdot)} is the ordinary characteristic function of the set {V_i}.

Let {V,W} be sets and {X \subset V \times W}, and let {\pi_V:X \rightarrow V, \pi_W:X \rightarrow W} be the two projections. We will abuse notation slightly and denote for {v \in V} (resp. {w \in W}) {X_v = \pi_V^{-1}(v)} (resp. {X_w = \pi_W^{-1}(w)}) by {X_v} (resp. {X_w}). Let {\mathcal{C} = \{ \pi_V(X_w) \mid w \in W\} \subset 2^V} and {\mathcal{D} = \{\pi_W(X_v) \mid v \in V\} \subset 2^W}.

It is an easy exercise now to check that for {w_1,\ldots,w_n \in W},

\displaystyle \mathrm{card}(\Sigma(V,\{X_{w_1},\ldots,X_{w_n}\})) = \mathrm{card}(S(\{w_1,\ldots,w_n\},\mathcal{D})),

Denoting

\displaystyle F(n) = \max_{(w_1,\ldots,w_n) \in W^n} \mathrm{card}(\Sigma(V,\{X_{w_1},\ldots,X_{w_n}\})),

we have

\displaystyle \mathrm{vcdensity}(\mathcal{D}) = \limsup_{n} \frac{\log F(n)}{n}.

In particular, upper bounds on the cardinalities of the set of {0/1}-patterns realized by finite subsets of {\mathcal{C}}, gives upper bound on the VC density of the family {\mathcal{D}}. The VC density of the definable family {\mathcal{D}} is then called the VC codensity of the definable family {\mathcal{C}}.

Example 1 Let {V = W = \mathbb{R}^2}, and {X = \{((x,y),(a,b)) \in V \times W \mid (x-a)^2 + (y-b)^2 \leq 1\}}. Then, {\mathcal{C}} is the set of all closed unit disks in the {V}, and {\mathcal{D}} the set of all closed unit disks in {W}.

Proving tight upper bounds on the VC codensities of (or equivalently {0/1}-patterns realized by) definable families in various structures has attracted a lot of attention of both combinatorialists and model theorists. In many combinatorial applications, upper bounds on the number of {0/1}-patterns or sign patterns if the underlying field is real give upper bounds on the number of interesting objects — classical applications of such methods include proving upper bounds on the number of realizable oriented matroids, combinatorial types of polytopes etc.

The very first application of this type that I am aware of is:

Goodman, Jacob E.; Pollack, Richard. There are asymptotically far fewer polytopes than we thought. Bull. Amer. Math. Soc. (N.S.) 14 (1986), no. 1, 127-129.

More recently, results in incidence combinatorics (such as the Szemeredi-Trotter theorem and its various generalizations) which have been traditionally studied over the real and complex fields, have being generalized to more general model-theoretic structures. Upper bounds on VC density/codensity often play a key role in such questions.

For the rest of this post we will now switch to the more geometric language of {0/1}-patterns and explain how techniques from topology and algebraic geometry help to prove tight upper bounds on the number of {0/1}-patterns.

6. Bounding the number of {0/1}-patterns: what has cohomology to do with it ?

It is instructive to consider the following continuation of Example 1.

To illustrate the main idea consider the problem of bounding the number of {0/1}-patterns realized by {n} closed unit disks in {\mathbb{R}^2} (cf. Example 1), assuming that the boundaries of the disks intersect transversally.

It is not difficult to see that the number of {0/1}-patterns in this case is bounded by the number of connected components of the complement of the union of the boundaries of these disks.

So it suffices to bound this number. We make another simplification for exposition. It makes no difference topologically since all the boundaries are compact to assume that the arrangement of {n} circles is on the sphere {\mathbb{S}^2} rather than on {\mathbb{R}^2} (by taking a one-point compactification of the plane if you wish).

Let the {n} unit circles be denoted {S_1,\ldots,S_n}. Using Alexander duality,

\displaystyle b_0(\mathbb{S}^2 - \bigcup_{i} S_i) = b_1(\bigcup_i S_i) +1,

where I am denoting by {b_i(\cdot)} the dimension of the {i}-th cohomology group with rational coefficients. To bound {b_i(\bigcup_i S_i)} we make use of a spectral sequence argument. We consider the Mayer-Vietoris spectral sequence (more precisely the spectral sequence of the row-wise filtration of the Čech complex) of the covering {\{S_i\}_{i \in [1,n]}} of the union {\bigcup_{i \in [1,n]} S_i}:

\displaystyle E_r^{p,q} \Rightarrow \mathrm{H}^{p+q}(\bigcup_{i \in [1,n]} S_i).

The {E_1}-page of this spectral sequence is given by

\displaystyle E_1^{p,q} = \bigoplus_{I \subset_{p+1} [1,n]} \mathrm{H}^q(S_I),

where for {I \subset [1,n]},

\displaystyle S_I = \bigcap_{i \in I} S_i.

Observe that since in any spectral sequence,

\displaystyle \dim E_{r}^{p,q} \geq \dim E_{r'}^{p,q}, r \leq r',

we have for every {k \geq 0}

\displaystyle \dim \mathrm{H}^k(\bigcup_{i \in [1,n]} S_i) = \sum_{p+q=k} \dim E_{\infty}^{p,q} \leq \sum_{p+q=k} \dim E_{1}^{p,q}.

In partiular, taking {k=1},

\displaystyle b_1(\bigcup_i S_i) = \dim E_\infty^{0,1} + \dim E_\infty^{1,0} \leq \dim E_1^{0,1} + \dim E_1^{1,0},

it suffices to bound the dimensions of the groups {E_1^{0,1}} and {E_1^{1,0}} separately. Now,

\displaystyle  E_1^{0,1} = \bigoplus_{i \in [1,n]} \mathrm{H}^1(S_i).

In our case \dim \mathrm{H}^1(S_i) = 1 for each i , and so

\displaystyle \dim E_1^{0,1} = \binom{n}{1} \cdot 1 = n. ,

Also,

\displaystyle  E_1^{1,0} = \bigoplus_{1 \leq i < j \leq n} \mathrm{H}^0(S_i \cap S_j) , and in our case \dim \mathrm{H}^0(S_i \cap S_j) = 0 \mbox{ or } 2 for each pair i < j,   and so

\displaystyle \dim E_1^{1,0} \leq \binom{n}{2} \cdot 2.

Putting everything together we obtain

\displaystyle b_1(\bigcup_i S_i) \leq n + \binom{n}{2}\cdot 2 = n^2.

This implies that the number of {0/1}-patterns realized by {n} closed unit disks (whose boundaries intersect transversally) is bounded by {n^2+1}.

The main things to observe about the above argument are the following.

  1. The bound on the {E_1}-terms of the spectral sequence is a product of two quantities, namely —
  1. A binomial coeficient counting the number of summands in the direct sum of case. (This number grows with {n} and has a polynomial dependence. The maximum degree of this polynomial amongst the various terms ({E_1^{1,0}} in our case) provides an upper bound on the VC codensity (of the family {\mathcal{C}}).)
  2. An upper bound on a certain Betti number of a set obtained as the intersection of a bounded number (the bound does not grow with {n}) of sets obtained in some way from the original sets (taking boundaries in this instance). (There is an absolute upper bound on this part which is independent of {n}.)
  • Secondly, what we actually ended up bounding is the sum of zero-th Betti number (i.e. the number of connected components) of the realizations of all the {0/1}-patterns realized by the set {\mathcal{D} = \{D_1,\ldots,D_n\}}. If we denote for {\sigma \in \{0,1\}^{[1,n]}}

    \displaystyle \mathcal{R}(\sigma) = \{x \in \mathbb{R}^2 \mid \sigma(i) = \chi_{D_i}(x), 1 \leq i \leq n\},

    we have proved:

    \displaystyle \sum_{\sigma \in \{0,1\}^{[1,n]}} b_0(\mathcal{R}(\sigma)) \leq n^2+1. \ \ \ \ \ (2)

     

    On other hand, since {\mathcal{R}(\sigma) \neq \emptyset \Rightarrow b_0(\mathcal{R}(\sigma) > 0}, we have the inequality,

    \displaystyle \mathrm{card}(\Sigma(\mathbb{R}^2,\mathcal{D})) \leq \sum_{\sigma \in \{0,1\}^{[1,n]}} b_0(\mathcal{R}(\sigma)). \ \ \ \ \ (3)

     

    Together, (2) and (3) imply that

    \displaystyle \mathrm{card}(\Sigma(\mathbb{R}^2,\mathcal{D})) \leq n^2 +1.

     

  • Finally, a key role (which is hidden by our use of Alexander duality) is played by the fact that the cohomology groups of semi-algebraic subsets of {\mathbb{R}^2} vanishes in dimensions {2} and higher.The “spectral sequence argument” sketched above with certain extensions give a powerful method for obtaining tight bounds on VC codensities. In our example, we made some simplifications to make the exposition simpler. For example, we assumed that the boundaries of the disks intersected transversally. If this was not the case, and the circles were allowed to touch each other, then the argument would fail (the realizations of the {0/1}-patterns will not be bounded from above by the number of connected components of the complement of the union of circles. In this case one needs to replace the union of the boundaries by a more complicated system of sets (involving infinitesimal tubes of various widths around the sets of the given family). Furthermore, if the given family was not in {\mathbb{R}^2}, but in some other semi-algebraic set not necessarily non-singular, then we would not be able to use Alexander duality. So there are many technical issues that I skipped but which needs to be dealt with in order to make the argument general. These can be found in: 

    Basu, Saugata; Pollack, Richard; Roy, Marie-Francoise On the Betti numbers of sign conditions. Proc. Amer. Math. Soc. 133 (2005), no. 4, 965-974.

    7. Some History

    As mentioned earlier, the problem of obtaining tight bounds on the VC-codensity (or equivalently, a bound on the number of {0/1}-patterns) for definable families in an NIP structure has attracted a lot of attention over the years.

    For definable families of algebraic hypersurfaces in {\mathbb{F}^k} of fixed degree over a field {\mathbb{F}}, Babai, Ronyai, and Ganapathy in

    Ronyai, Lajos; Babai, Laszlo; Ganapathy, Murali K. On the number of zero-patterns of a sequence of polynomials. J. Amer. Math. Soc. 14 (2001), no. 3, 717-735.

    used a “linear algebra argument” to prove that VC-codensity of such families is bounded by {k}. This bound is easily seen to be optimal.

    For families of hypersurfaces (defined by polynomials of bounded degrees) on any real variety of {V}, it is shown in

    Basu, Saugata; Pollack, Richard; Roy, Marie-Francoise. On the Betti numbers of sign conditions. Proc. Amer. Math. Soc. 133 (2005), no. 4, 965-974.

    using the topological method outlined in Section 6 that the VC-codensity is bounded by {\dim_{\mathbb{R}} V}.

    Finally, it was shown in

    Basu, Saugata. Combinatorial complexity in o-minimal geometry. Proc. Lond. Math. Soc. (3) 100 (2010), no. 2, 405-428.

    that the VC codensity of any definable family of subsets of a definable set {V} in an arbitary o-minimal expansion of a real closed field is bounded by {\dim V}.

    (Note that some of the papers listed above prove more refined and general upper bounds on the Betti numbers of the realizations of {0/1}– patterns or sign patterns — where there is also explicit dependence on the family (such as on the degrees of the polynomials defining the hypersurfaces). The bound on the VC codensity is extracted from the exponent of “{n}” in the bound on the zero-th Betti number as in Section 6.)

    8. VC codensity bounds over valued fields

    A class of structures that is important in model theory (and indeed other areas of mathematics, such as number theory) are valued fields. We are mostly interested in fields {K}, with a non-archimedean valuation {|\cdot|: K - \{0\} \rightarrow \Gamma}, where {\Gamma} is an ordered group. Tha valuation {|\cdot|} is assumed to be multiplicative and satisfy the ultrametric inequality — namely, {|x + y| \leq \max(|x|,|y|)}.

    (Note that we are following model theory convention and writing the valuation multiplicatively instead of additively which is more standard).

    The most familiar example of a valued field (with a non-archimedean valuation) is the field {\mathbb{Q}} with the {p}-adic valuation, {|\cdot|_p} for some prime number {p}. Writing a non-zero rational number {\frac{a}{b}}

    \displaystyle \frac{a}{b} = p^\nu \cdot \frac{c}{d}, \mbox{ where } (p,c) = (p,d) =1,

    the {p}-adic valuation is given by

    \displaystyle \left|\frac{a}{b} \right|_p = p^{-\nu}.

    Given a valued field {(K,|\cdot|)}, its valuation ring {A} is defined as {A = \{ x \in K | |x| \geq 1\}}. The valuation ring {A} has a unique maxmal ideal {\mathfrak{m} = \{x \in A \mid |x| > 1\}}, and the quotient field {k= A/\mathfrak{m}} is called the residue field of {(K,|\cdot|)}. The residue field of {\mathbb{Q}_p} is isomorphic to {\mathbb{Z}/p \mathbb{Z})}. The pair {(\mathrm{char}(K),\mathrm{char}(k))} is called the characteristic pair of {(K, |\cdot|)}. The characteristic pair of a valued field can be {(0,0), (0,p)}, or {(p,p)} where {p} is a prime. The characteristic pair of {(\mathbb{Q},|\cdot|_p)} is clearly {(0,p)}. The fields of Puiseux series {\bigcup_{n > 0} F(T^{1/n})} with coefficients in a field {F}, furnishes examples of valued fields with characteristic pairs {(0,0)} or {(p,p)} depending on the characteristic of {F}.

    From the model theory perspective, the structure of a valued field is an example of a two-sorted structure (as opposed to the one-sorted structures defined in Section 4). This means that there are two sorts of variables — of the field sort and the value group sort. The language of valued fields is correspondingly two sorted and is the tuple {(0_K,1_K,+_K,\cdot_K, \cdot_\Gamma,<_\Gamma)} where the subscripts {K,\Gamma} refer to the two sorts — field and value group respectively. It is a non-trivial theorem that the two-sorted theory of algebraically closed valued fields admit quantifier elimination (in the corresponding two-sorted language). Thus if {(K,\Gamma)} is a model of the theory algebraically closed valued fields, then every definable subset of {K^m} can be described by a quantifier-free formula, whose atoms are of the form {|F(x)| \leq \lambda \cdot |G(x)|}, where {F,G \in K[x]} and {\lambda \in \Gamma}.

    A good source for the model theory of algebraically closed valued fields (and also for what comes next) is:

    Haskell, Deirdre; Hrushovski, Ehud; Macpherson, Dugald. Stable domination and independence in algebraically closed valued fields. Lecture Notes in Logic, 30. Association for Symbolic Logic, Chicago, IL; Cambridge University Press, Cambridge, 2008. xii+182 pp. ISBN: 978-0-521-88981-0

    If {K} is an algebraically closed valued field (for example {K = \overline{\mathbb{Q}_p}, \mathbb{C}((T)), \overline{\mathbb{F}}_p((T))}), then the problem of obtaining tight bounds on the VC-codensity for definable families in {K^p} was considered in:

    Aschenbrenner, Matthias; Dolich, Alf; Haskell, Deirdre; Macpherson, Dugald; Starchenko, Sergei Vapnik-Chervonenkis density in some theories without the independence property, I. Trans. Amer. Math. Soc. 368 (2016), no. 8, 5889-5949.

    In particular, they prove a bound of {2 p} for the VC-codensity in the special case when the characteristic pair of {K} is {(0,0)}.

    If one contrasts this result with those in Section 7 above, one sees that the corresponding bound for the theories of algebraically closed and real closed fields, and also for any o-minimal theory, is {p} rather than {2p}. In fact, the bound in those cases equals the “cohomological dimension” (appropriately defined) of the ambient definable set for the definable family being considered.

    The method used in the paper of Aschenbrenner et. al. is algebraic and model theoretic. It is natural to consider a more topological approach. The first obstacle is that definable subsets of a valued field might not carry a nice enough topology on which the cohomological method described in Section 6 can be applied.  For various reasons (non-vanishing in dimensions bigger than the dimension of the ambient variety being one of them) ordinary sheaf cohomology with respect to the Zariski or etale site for schemes are not suitable.

    In order to get around this difficulty we first need to take a detour.

    Given an affine variety {V} defined over a complete valued field, we will embed {V} and all definable subsets of {V}, in a larger topological space having better topological properties than {V} itself. Embedding into a larger space can only increase the number of realizable {0/1}-patterns — and thus an upper bound on the number of {0/1}-patterns realized in the larger space is still an upper bound on the corresponding number of the original family. The technical tool to achieve this is also known as Berkovich analytification introduced by Berkovich.

    Berkovich analytification

    Suppose that {K} is a complete valued field with a non-trivial non-archimedean valuation.

    Let {R} be an integral domain. A multiplicative, non-archimedean seminorm on {R} is a map {|\cdot|: R \rightarrow \mathbb{R}_{\geq 0}} satisfying, {|0| = 0, |1| =1}, and {|a b| = |a||b|}, and the ultrametric inequality, {|a + b| \leq \max(|a|,|b|)}.

    Now, let {V = \mathrm{Spec}(R)} be an affine {K}-variety. Berkovich associates to {V} a space {V^{\mathrm{an}}} in the following way. The points of {V^{\mathrm{an}}} are multiplicative semi-norms {|\cdot|: R \rightarrow \mathbb{R}_{\geq 0}}, whose restriction to {K} equals the given valuation on {K} (semi-norms on fields are valuations as defined previously). For each {x \in V^{\mathrm{an}}}, let {|\cdot|_x} denote the corresponding semi-norm. The topology on {V^{\mathrm{an}}} is the weakest topology that makes all the evaluation maps,

    \displaystyle V^{\mathrm{an}} \rightarrow \mathbb{R}_{\geq 0}, x \mapsto |\phi|_x, \phi \in R

    continuous. Moreover, (this is key for our application), for each closed point {x \in V}, the function

    \displaystyle |\cdot|_x: R \rightarrow\mathbb{R}_{\geq 0}, \phi \mapsto |\phi(x)|

    (noticing that {\phi(x) \in K}) is a semi-norm, and this gives an embedding of the closed points of {V} into {V^{\mathrm{an}}}.

    The Berkovich analytic space {V^{\mathrm{an}}} has nice topological properties (it is locally compact for example), but it is not so easy to visualize. The Berkovich analytification of the projective line {\mathbb{P}^1_K} for example is homeomorphic to an infinite rooted tree. Moreover, for application to bounding VC codensity one has to deal with not just varieties, but definable subsets of these varieties as well.

    Berkovich, Vladimir G. Spectral theory and analytic geometry over non-Archimedean fields. Mathematical Surveys and Monographs, 33. American Mathematical Society, Providence, RI, 1990. x+169 pp. ISBN: 0-8218-1534-2

    Hrushovski-Loeser

    Fortunately, the recent break-through results of Hrushovski and Loeser gives us to get around the problems mentioned above. Hrushovski and Loeser give an alternative definition of {V^{\mathrm{an}}} as spaces of certain kind types. (As has been mentioned before, the model theorists’ method of producing spaces from rings is to define them as spaces of types — cf. discussion relating the {\mathrm{Spec}} of a polynomial ring in {n} variables with the space of complete {n}-types in Section 4 above). Moreover, associated to any quantifier-free formula with atoms of the form {|F| \leq \lambda \cdot |G|}, one has a corresponding semi-algebraic subset of {V^{\mathrm{an}}} (thought of now as a space of types). Astoundingly, Hrushovksi and Loeser prove certain key topological tameness properties of these “semi-algebraic sets” which are entirely analogous to those known in the case of o-minimal structures. Moreover, they prove that these “semi-algebraic” subsets of {V^{\mathrm{an}}} are homotopy equivalent to finite simplicial complexes of dimension at most {\dim(V)}. Therefore, their cohomological dimension is at most {\dim(V)}.

    Hrushovski, Ehud; Loeser, Francois. Non-archimedean tame topology and stably dominated types. Annals of Mathematics Studies, 192. Princeton University Press, Princeton, NJ, 2016. vii+216 pp. ISBN: 978-0-691-16169-3; 978-0-691-16168-6

    Using the results of Hrushovski and Loeser, in particular the vanishing of cohomology in dimensions greater than {\dim V}, Deepam Patel and I, were able to “import” the o-minimal techniques for proving upper bounds on Betti numbers of realizations of {0/1}-patterns to the valued field case. Available here.

    We prove the following theorem (stated a bit informally to avoid introducing more notation).

    Theorem 5 Let {K} be a algebraically closed complete non-archimedean valued field, {V} be an affine {K}-variety and {\mathcal{D}} a fixed definable family of semi-algebraic subsets of {V^{\mathrm{an}}}. Then, there exists {c > 0}, such that for each {i, 0 \leq i \leq \dim V}, and for every finite subset {\{D_1,\ldots,D_n\}} of elements of the family {\mathcal{D}}

    \displaystyle \sum_{\sigma \in \{0,1\}^{[1,n]}} b_i(\mathcal{R}(\sigma)) \leq c \cdot n^{\dim V -i},

    where

    \displaystyle \mathcal{R}(\sigma) = \{x \in V^{\mathrm{an}} \mid \sigma(i) = \chi_{D_i}(x), 1 \leq i \leq n\}.

     

    As an corollary (taking as usual {i=0}) in the above bound we obtain (using the same notation as in Theorem 5(:

    Corollary 6 The VC codensity of any definable family of definable subsets of {V} is bounded by {\dim V}.

    Corollary 6 removes the multiplicative factor of {2} from the upper bounds proved by Aschenbrenner et. al. mentioned before. Also, there is no restriction on the characteristic pair of {K} (thanks to the generality of the results by Hrushovski and Loeser).

    Archimedean vs non-archimedean

    It is a natural question to ask about what happens if we consider subsets defined by inequalities {|F(x)| \leq \lambda \cdot |G(x)|} when the {|\cdot|} is the archimedean norm — for example, when {K=\mathbb{C}}, and {|\cdot|} denotes the modulus. The answer is that for a definable family in {\mathbb{C}^p} defined by a formula having atoms of the form {|F| \leq \lambda \cdot |G|}, the VC codensity can be as large as {2 p}. An easy example is the following. Consider the definable family, {\mathcal{C}}, of unit disks in {\mathbb{C}} defined by {|z - w| \leq 1}, as {w} varies over {\mathbb{C}}. Given {n} unit disks in the plane, it is easy to check that the number of {0/1}-patterns can grow quadratically — so the VC codensity of {\mathcal{C}} is {2}. What happens to the family defined by the same formula in the non-archimedean case ? In this case, because of the ultra-metric inequality, two disks are either disjoint or equal. Hence the number of {0/1}-patterns is bounded by {n+1}, and the VC codensity is {1}.

    Epilogue

    I hope I have been able to convey the breadth of mathematical topics that the notion of VC dimensions has connections to. It remains an extremely active topic. For example, here are two very recent combinatorial papers from the arXiv in which VC dimension plays a critical role.

    Bounded VC-dimension implies the Schur-Erdos conjecture, Jacob Fox, Janos Pach, Andrew Suk

    An Efficient Algorithm for Generalized Polynomial Partitioning and Its Applications. Pankaj K. Agarwal, Boris Aronov, Esther Ezra, Joshua Zahl

    There are many open questions — some directly related and some unrelated to the the topics discussed above.

    One immediate open problem would be to extend the “cohomological method” to the {p}-adics. The problem here is that the theory of {\mathbb{Q}_p} admits quantifier elimination only in an extended language (using the so called Macintyre’s predicates for example)– and thus not all definable subsets of {\mathbb{Q}_p^m} can be defined by quantifier-free formulas with atoms of the form {|F| \leq \lambda\cdot|G|}.

    A more open ended problem (to which I will devote another post) is to “integrate” cohomology more directly inside the model theoretic framework. Some idea of what I have in mind can be found here (joint work with Deepam Patel).

 

Standard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s