(MIT), Something for Almost Nothing: Advances in Sub-linear Time Algorithms.|
Linear-time algorithms have long been considered the gold
standard of computational endciency. Indeed, it is hard to imagine doing
better than that, since for a nontrivial problem, any algorithm must
consider all of the input in order to make a decision. However, as extremely
large data sets are pervasive, it is natural to wonder what one
can do in sub-linear time. Over the past two decades, several surprising
advances have been made on designing such algorithms. We will give a
non-exhaustive survey of this emerging area, highlighting recent progress
and directions for further research.
(IMDEA Software Institute), Computer-Aided Cryptographic Proofs.
(Princeton), "If You Can Specify It, You Can Analyze It" - The Lasting Legacy of Philippe Flajolet.
The "Flajolet School" of the analysis of algorithms and combinatorial
structures is centered on an effective calculus, known as analytic
combinatorics, for the development of mathematical models that are
sufficiently accurate and precise that they can be validated through
scientific experimentation. It is based on the generating function as
the central object of study, first as a formal object that can
translate a specification into mathematical equations, then as an
analytic object whose properties as a function in the complex plane
yield the desired quantitative results. Universal laws of sweeping
generality can be proven within the framework, and easily
applied. Standing on the shoulders of Cauchy, Polya, de Bruijn, Knuth,
and many others, Philippe Flajolet and scores of collaborators
developed this theory and demonstrated its effectiveness in a broad
range of scientific applications. Flajolet's legacy is a vibrant field
of research that holds the key not just to understanding the
properties of algorithms and data structures, but also to
understanding the properties of discrete structures that arise as
models in all fields of science. This talk will survey Flajolet's
story and its implications for future research.
(University of Chile), Encoding Data Structures.
"A man ... endowed with an an exuberance of imagination which puts
it in his power to establish and populate a universe of his own creation".
Classical data structures can be regarded as additional information
that is stored on top of the raw data in order to speed up some kind
of queries. Some examples are the suffix tree to support pattern
matching in a text, the extra structures to support lowest common
ancestor queries on a tree, or precomputed shortest path information
on a graph.
(Cornell U.), Kleene Algebra with Tests and the Static Analysis of Programs.
Some data structures, however, can operate without accessing the
raw data. These are called encodings. Encodings are relevant when
they do not contain enough information to reproduce the raw data, but
just what is necessary to answer the desired queries (otherwise, any
data structure could be seen as an encoding, by storing a copy of the
raw data inside the structure).
Encodings are interesting because they can occupy much less space
than the raw data. In some cases the data itself is not interesting,
only the answers to the queries on it, and thus we can simply discard
the raw data and retain the encoding. In other cases, the data is used
only sporadically and can be maintained in secondary storage, while
the encoding is maintained in main memory, thus speeding up the most
When the raw data is available, any computable query on it can be
answered with sufficient time. With encodings, instead, one faces a
novel fundamental question: what is the effective entropy of the data
with respect to a set of queries? That is, what is the minimum size
of an encoding that can answer those queries without accessing the
data? This question is related to Information Theory, but in a way
inextricably associated to the data structure: the point is not how
much information the data contains, but how much information is
conveyed by the queries. In addition, as usual, there is the issue of
how efficiently can be the queries answered depending on how much
space is used.
In this talk I will survey some classical and new encodings,
generally about preprocessing arrays A[1, n] so as to answer
queries on array intervals [i, j] given at query time. I will
start with the classical range minimum queries (which is the minimum
value in A[i,j]?) which has a long history that culminated a
few years ago in an asymptotically space-optimal encoding of
2n+o(n) bits answering queries in constant time. Then I will
describe more recent (and partly open) problems such as finding the
second minimum in A[i, j], the k smallest values in
A[i, j], the kth smallest value in A[i, j], the elements
that appear more than a fraction τ of the times in A[i, j],
etc. All these queries appear recurrently within other algorithmic
problems, and they have also direct application in data mining.
Succinct data structures are data representations that use
the (nearly) the information theoretic minimum space, for the combinatorial
object they represent, while performing the necessary query operations
in constant (or nearly constant) time. So, for example, we can
represent a binary tree on n nodes in 2n + o(n) bits, rather than the
"obvious" 5n or so words, i.e. 5n lg(n) bits. Such a difference in memory
requirements can easily translate to major differences in runtime as a
consequence of the level of memory in which most of the data resides.
The field developed to a large extent because of applications in text indexing,
so there has been a major emphasis on trees and a secondary
emphasis on graphs in general; but in this talk we will draw attention to
a much broader collection of combinatorial structures for which succinct
structures have been developed. These will include sets, permutations,
functions, partial orders and groups, and yes, a bit on graphs.