Theory Seminar

Time: Notified by emails; Usually on Fridays
Place: Lindley Hall 101
Who: Open to faculty and students interested in theoretical aspects of computer science
Contact: Qin Zhang (qzhangcs@indiana.edu) or Jiecao Chen (jiecchen@indiana.edu)

Fall 2015 Schedule

Nov 13 | Amit Sahai (UCLA) on Software with Secrets (CS Colloquium)
Nov 04 | Leslie Valiant (Harvard) on How Nature Exploits Big Data: Learning and Evolution (CS Colloquium)
Oct 30 | Nirman Kumar (UCSB) on Geometric Computing over Uncertain Data
Oct 16 | Martha White on Adaptive Kernel Representations
Sep 25 | Qin Zhang on The Communication Complexity of Distributed Set-Joins with Applications to Matrix Multiplication

Spring 2014 Schedule

Jan 24 | Andrew McGregor on Analyzing Big Graphs via Sketching and Streaming (CS colloquium)
Jan 31 | Ronitt Rubinfeld on Something for almost nothing: Recent advances in sub-linear time algorithms (CS colloquium)
Feb 14 | Qin Zhang on Lower Bound Techniques for Multiparty Communication
Mar 14 | Shafi Goldwasser on The Cryptographic Lens (CS colloquium)
Mar 28 | Erfan Sadeqi Azer on Palindrome Recognition In The Streaming Model
April 11 | Yanfang Le on Online Load Balancing for MapReduce with Skewed Data Input
April 25 | Larry Moss on Complexity Results for “Natural Logics”

Fall 2013 Schedule

(Thank Hani T. Dawoud for organizing it)

A Crash Introduction on Data Stream Algorithms
Speaker: Qin Zhang
Time: Thursday at 12:00pm
Place: Lindley Hall 101

In this talk I will give a crash introduction to data stream algorithms. In particular, I will overview some important and well-studied problems (for example, how to do sampling, how to find frequent items and how to count number of distinct elements) in the streaming model, together with some proof ideas/sketches. The goal is to convey some basic ideas and techniques used in the design of data stream algorithms. No background on data stream is required.

Top

Pseudorandomness I
Speaker: Hani T. Dawoud
Time: Thursday at 12:00pm
Place: Lindley Hall 101

Pseudorandom generators (PRG) are fundamental primitives in cryptography and computational complexity. A PRG is a polynomial-time computable function G that stretches a short random string s into a longer string G(s) that looks random to any efficient distinguisher. In this survey talk I will discuss definitions, applications, and constructions of PRGs. The focus of the talk will be on the proof techniques underlying the state-of-the-art PRG construction.

Top

Pseudorandomness II
Speaker: Hani T. Dawoud
Time: Thursday at 12:00pm
Place: Lindley Hall 101

Pseudorandom generators (PRG) are fundamental primitives in cryptography and computational complexity. A PRG is a polynomial-time computable function G that stretches a short random string s into a longer string G(s) that looks random to any efficient distinguisher. In this survey talk I will discuss definitions, applications, and constructions of PRGs. The focus of the talk will be on the proof techniques underlying the state-of-the-art PRG construction.

Top

Sublinear Time Algorithms/Property Testing for Large Data
Speaker: Funda Ergun
Time: Thursday at 12:00pm
Place: Lindley Hall 101

As we learned from Qin Zhang’s talk, if the data size is large, we would like to keep our space usage to sublinear in the data size. If we take this a step further, we might want to keep our running time to sublinear. However, this means that we don’t even have enough time to read the entire input, and have to perform our computation using whatever small fraction of the input we can read. This is typically done by making random queries to the input, and coming up with some kind of approximation as the answer. In this presentation we will see an introduction into works that explore the tradeoff between the amount of access to the input and the accuracy of the output, with a special emphasis on one particular model, combinatorial property testing. In this model, we are allowed to read a constant, or, usually at most polylogarithmic number of bits from the input and answer the question “is the input close to satisfying combinatorial property X?” by using polylogarithmic space/time. We will see examples of property testers from various domains.

Top

An Introduction to Homotopy Type Theory
Speaker: Amr Sabry
Time: Thursday at 12:00pm
Place: Lindley Hall 101

Homotopy Type Theory is a new development that establishes surprising connections between computer science, logic, algebra, geometry, topology, and physics. In this talk, I will introduce and motivate the main definitions and outline some connections and applications.

Top

Probabilistic Programming for the Uninitiated
Speaker: Rob Zinkov
When: Thursday at 12:00pm
Where: Lindley Hall 101

We often have data not easily explainable with existing tools. These are problems where the data may naturally cluster or possibly has hierarchical structure. Usually, we solve these problems by writing specialized software for performing inference from scratch. In this talk, I will show how probabilistic programming offers an alternative approach where we write declarative models of our data and get inference algorithms for free. I will use a Bayesian inference library called PyMC to demonstrate these ideas.

Top

Sketching as a Tool for Numerical Linear Algebra
Speaker: David Woodruff, IBM Research Almaden
Host: Qin Zhang
Time: Thursday at 12:00pm
Place: Lindley Hall 101

I will discuss how sketching techniques from the data stream literature can be used to speed up well-studied algorithms for problems occurring in numerical linear algebra, such as least squares regression and approximate singular value decomposition. I will also discuss how they can be used to achieve very efficient algorithms for variants of these problems, such as robust and structured regression. In many cases the algorithms obtained using sketching are the only known ways to obtain asymptotically optimal algorithms.

Top

Productivity of Stream Specifications
Speaker: Jörg Endrullis, Vrije Universiteit Amsterdam
When: Thursday at 12:00pm
Where: Lindley Hall 101

We are concerned productivity (well-definedness) of specifications of infinite streams of adata. In general, this property is undecidable, but for restricted formats sufficient conditions can be obtained. The usual analysis, also adopted here, disregards the identity of data, thus leading to approaches that we call data-oblivious. We present a method that is provably optimal among all such data-oblivious approaches. To model the quantitative behavior of stream operations we employ periodically increasing functions which are a generalization of linear functions employed by Hughes, Pareto and Sabry in `Proving the correctness of reactive systems using sized types’.

Top

Lower Bound Techniques for Multiparty Communication
Speaker: Qin Zhang
Time: Friday, Feb 14 at 12pm
Place: Lindley Hall 325

Abstract:

In this talk we will discuss multiparty communication complexity in the message-passing model, where we have k machines, each having a piece of data and they want to jointly compute some function defined on the union of the k data sets via communication. The communication is point-to-point, and the goal is to minimize the total communication between the k sites. This model captures all point-to-point distributed computational models for Big Data in terms of communication complexity, including the BSP model and the MapReduce model. Problems considered in this model include statistical & graph problems, numerical linear algebra and database queries.
In this talk we will introduce new techniques developed in the last two years for proving communication lower bounds in the message-passing model. We believe that these techniques will have a wide applicability.

Palindrome Recognition In The Streaming Model

Speaker: Erfan Sadeqi Azer, Simon Fraser University
Time: Friday, Mar 28 at 12pm
Place: Lindley Hall 325
authors:
abstract:

In the Palindrome Problem one tries to find all palindromes (palindromic substrings) in a given string. A palindrome is defined as a string which reads forwards the same as backwards, e.g., the string “racecar”. A related problem is the Longest Palindromic Substring Problem in which finding an arbitrary one of the longest palindromes in the given string suffices. We regard the streaming version of both problems. In the streaming model the input arrives over time and at every point in time we are only allowed to use sublinear space. The main algorithms in this paper are the following: The first one is a one-pass randomized algorithm that solves the Palindrome Problem. It has an additive error and uses O(n sqrt(n)) space. The second algorithm is a two-pass algorithm which determines the exact locations of all longest palindromes. It uses the first algorithm as the first pass. The third algorithm is again a one-pass randomized algorithm, which solves the Longest Palindromic Substring Problem. It has a multiplicative error using only O(log(n)) space. We also give two variants of the first algorithm which solve other related practical problems.

Online Load Balancing for MapReduce with Skewed Data
Speaker: Yanfang Le, Simon Fraser University
Time: Friday, April 11 at 12pm
Place: Lindley Hall 325
Abstract:

MapReduce has emerged as a powerful tool for
distributed and scalable processing of voluminous data. In this
paper, we, for the first time, examine the problem of accommodating
data skew in MapReduce with online operations. Different
from earlier heuristics in the very late reduce stage or after
seeing all the data, we address the skew from the beginning of
data input, and make no assumption about a priori knowledge
of the data distribution nor require synchronized operations. We
examine the input in a continuous fashion and adaptively assign
strategy is a constrained version of online minimum makespan
and, in the MapReduce context where pairs with identical keys
must be scheduled to the same machine, there is an online
algorithm with a provable 2-competitive ratio.We further suggest
a sample-based enhancement, which, probabilistically, achieves a
3/2-competitive ratio with a bounded error.

Analyzing Big Graphs via Sketching and Streaming
Speaker: Andrew McGregor, University of Massachusetts (CS Colloquium)
Time: Friday, Jan 24 at 3pm
Place: Lindley Hall 102
Abstract:
Early work on data stream algorithms focused on estimating numerical statistics such as quantiles, frequency moments, and heavy hitters given limited memory and a single pass over the input. However, there’s now a growing body of work on analyzing more structured data such as graphs. In this talk, we survey research in this area with a focus on recent results on dimensionality reduction via random projections and processing dynamic graphs.

Something for almost nothing: Recent advances in sub-linear time algorithms

Speaker: Ronitt Rubinfeld, MIT (CS Colloquium)

Time: Friday, Jan 31 at 3pm

Place: Lindley Hall 102
Abstract:
We have long considered showing the existence of a linear time algorithm for a problem to be the gold standard of achievement. Indeed, at first thought, it is hard to imagine doing better than that, since for a nontrivial problem, any algorithm must consider all of the input in order to make a decision. However, as extremely large data sets are pervasive, it is natural to wonder what one can do in {\em sub-linear} time. Over the past decade, several surprising advances have been made on designing such algorithms. This talk will contain a non-exhaustive survey of this emerging area, highlighting recent progress and directions for further research.

The Cryptographic Lens
Speaker: Shafi Goldwasser, MIT (CS Colloquium)
Time: Friday, March 14 at 3pm
Place: Lindley Hall 102
Abstract:
Going beyond the basic challenge of private communication, in the last 35 years, cryptography has become the general study of correctness and privacy of computation in the presence of a computationally bounded adversary, and as such has changed how we think of proofs, reductions, randomness, secrets, and information.

In this talk I will discuss some beautiful developments in the theory of computing through this cryptographic lens, and the role cryptography can play in the next successful shift from local to global computation.

Complexity Results for “Natural Logics”
Speaker: Larry Moss
Time: Friday, April 25 at 3pm
Place: Lindley Hall 325
Abstract: By “Natural Logics”, I mean a family of logical systems inspired by reasoning in fragments of ordinary language. Much of my talk will be an introduction to this area and to its connections to theoretical computer science. (Usually when I speak on this topic I emphasize connections to other fields: natural language semantics, computational semantics, and especially logic.) I will present some of the algorithms for satisfiability in various logics, and also some complexity results. As always, there is a trade-off between complexity and expressive power.
Éva Tardos
Cornell University
Friday, September 19, 2014
3:00 pm
Lindley Hall, Rm. 102

Topic: Games, Learning, and the Price of AnarchyAbstract: Selfish behavior can often lead to suboptimal outcome for all participants, a phenomenon illustrated by classical examples in game theory, such as the prisoner dilemma . In this talk, we’ll consider how to quantify the impact of strategic user behavior on overall performance. We’ll will consider traffic routing as well as online auctions from this perspective, and providing robust guarantees for their performance even when the system is not in equilibrium, assuming participants are using learning strategies to deal with an uncertain environment.

Efficient Statistical Estimations for Distributed Inconsistent Data
Speaker: Qin Zhang
Time: Friday, Sept 26, 12:30pm – 1:30pm
Location: LH101
Abstract:  This talk will discuss the following general question: Given a set of databases connected by a point-to-point communication network, each having a dataset which is potentially inconsistent with the others, how can we perform communication-efficient statistical estimations on the union of these datasets? Here inconsistent means that the same real-world entity may appear in different forms in different datasets, but those variants are still close with respect to a certain distance measurement. We give a first set of communication-efficient solutions for statistical estimations on distributed inconsistent data, including algorithms for distinct elements, L0-sampling, heavy hitters, frequency moments, and empirical entropy.

Black-Box Separations in Cryptography
Speaker: Hani Dawoud
Time & Venue: Friday 12:15pm – 1:15pm in LH101
Abstract: A typical impossibility result in cryptography takes the following form: “Cryptographic functionality F cannot be based on complexity assumption A in a black-box manner”. This talk will survey techniques for proving such results.

Automatic Sequences and Zip Specifications

Speaker: Larry Moss
Time & Venue: Friday 12:15pm – 1:15pm in LH101

Abstract: This talk has two purposes: first, it is a gentle introduction to the area of automatic sequences; second, it presents a relatively new decidability result on the equivalence of “zip-specifications” of streams.  What the two purposes have in common is a concern with the zip operation on streams, and how streams may be defined using it. Streams are infinite sequences, and the zip of two given streams is their interleaving.  One key example is the Thue-Morse sequence

t = (0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0, . . . ).

Suppose we consider the stream system

x = 1: zip(x,y)  y = 0: zip(y,x)   z = 0: x

where : means prefixing.  This system has a unique solution, and the solution for z is exactly the Thue-Morse sequence. There is a connection to automata theory that the talk explores.  Further, one may ask whether two stream systems have the same solution (for two particular variables). I asked a few years ago whether question is decidable, and it turns out that it is — so this is the second part of the talk.

Joint work with Joerg Endrullis, Clemens Grabmeyer, Dimitri Hendriks, and Jan Willem Klop.

Streaming, Sketching and Combinatorial Optimization

Speaker: Sudipto Guha,  University of Pennsylvania
Where: Lindley Hall, Rm. 102
Venue: Friday 3:00pm – 4:00pm in LH102

Abstract: Streaming algorithms were designed to cope with the challenge of increasing data sizes. While significant progress has been made towards that goal, a question remains if the new developments in streaming algorithms have offered us insights in solving problems beyond streaming.

In this talk we will look at one such example , the problem of finding the edges of a maximum matching in graphs. The techniques developed for streaming algorithms allowed us to develop the first fully polynomial almost linear time approximation schemes algorithms for b-matching problems (problems where vertices and edges have capacities) which was open for thirty years. In turn, the insights from this result lead us to newer characterizations of the matching polytope which can be used to develop map-reduce algorithms for near optimal maximum matching in small number of rounds and sublinear space.

Spectral Sparsification on Large graph and related topics​
Speaker: Jiecao Chen
Time & Venue: Friday 12:15pm – 1:15pm in LH101

Abstract: Graph sparsification is the approximation of an arbitrary graph by a sparse graph. We explain in this talk what it means for a graph to be spectrally similar to another. A sparse graph that spectrally similar to the original graph turns out to be an important tool in the design of near-linear algorithm for ​SDD linear system, which​ has already led to breakthrough results in combinatorial optimization.

We will also cover a result ​in lower bound by Andoni et al, ​which shows that the algorithm given by Batson et al. is actually tight. However, our ​recent work (*) shows that by relaxing the definition of spectral similarity, we might use less space to approximate the large graph.​

(*) Joint work with Bo Qin, David Woodruff and Qin Zhang

Top

Shared-Memory Parallelism Can Be Simple, Fast and Scalable
Speaker: Julian Shun, Carnegie Mellon University.
Time & Venue: Monday, 01:30PM – 02:30PM IMU Oak Room

Abstract: With the growth of data in recent years, efficiently processing large data has become crucial. As such, many have turned to parallel computing to deliver high performance. One source of parallelism appears in shared-memory multicore processors, which have become prevalent, appearing in personal computers and even cellular phones. My research focuses on developing frameworks and tools for simplifying shared-memory programming and designing large-scale shared-memory algorithms that are be efficient both in practice and in theory. In this talk, I will discuss Ligra, a shared-memory graph processing framework for simplifying the programming of shared-memory graph algorithms. The Ligra implementations outperform those written in previous graph processing systems and are competitive with optimized implementations. I will also discuss my work on practical large-scale shared-memory algorithms with strong theoretical guarantees for graph analysis and text processing, as well as tools for deterministic parallel programming.

Top

Avoiding the curse of dimensionality with maximally predictive models
Speaker: Sarah Marzen (UC Berkeley)
Time & Venue: Friday 12:15pm – 1:15pm in LH101

Abstract: GTime-series prediction suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures is, at first glance, exponential with length. The challenge is even more difficult for infinite-order Markov processes, since conditioning on finite sequences can never capture all of their past dependencies. In this talk, we’ll focus on how the curse of dimensionality affects predictive rate-distortion analyses, and how one can avoid the curse of dimensionality by using what are known as maximally predictive time series models.  Spectral arguments suggest that algorithms which cluster finite-length sequences will fail most dramatically when the underlying process has long-range temporal correlations. We circumvent the curse of dimensionality by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Several simple examples show that the resulting “causal rate distortion theory” can substantially improve upon previous analyses in which the process is not “effectively Markovian”.

Top

Title: Recursion versus tail recursion over abstract structures
Time & Venue: Friday 1:30pm – 2:30pm in LH101

Abstract: There are several ways to understand computability over
first-order structures. We may allow functions given by arbitrary
recursive definitions, or we may restrict ourselves to “iterative”
functions computable by nothing more complicated than while loops.

In the classical case of recursion over the natural numbers, these two
notions of computability coincide. However, this is not true in
general. We ask whether there is a model-theoretic classification of
structures over which iteration is as powerful as recursion.

In this talk I will discuss some conditions which affect this outcome
one way or the other. I will also give a few examples of
“intermediate” structures for which the question of recursion vs.
iteration reduces to hard open problems in computational complexity.

Top

Title: Recursion versus tail recursion over abstract structures
Speaker: Nirman Kumar (UCSB)
Time & Venue: Friday 12:15pm – 2:15pm in LH101

Abstract: The modern age is undoubtedly the age of data. Technological inventions have made the process of sensing, acquiring, and storing data easy to an extent that we have “too much of data”. However, not all of this data is precise — data imprecision and uncertainty is prevalent with huge amounts of data, geometric or otherwise. In this talk I will focus on computing some basic geometric primitives when the data is uncertain, and modeled in the existential model — each point or object in question is known to be active or only exists with a certain probability p. I will talk about over some recent work to compute probabilities of separability, containment and evasion, and expected range-max queries, when points have some associated values. While many such problems are computationally hard, we will see some algorithmic techniques in low dimensions, and some that extend to any constant dimension, to compute probabilities and expected values.

Top

Title: How Nature Exploits Big Data: Learning and Evolutio
Speaker: Leslie Valiant
Time & Venue: Wed 4:00PM – 5:00PM Swain West 119

Abstract: The term “Big Data” refers to the opportunities and challenges offered by the recent availability of unprecedented amounts of digital data. Many of the most exciting opportunities on the positive side are encompassed by the general area of machine learning, whose methods seek to discover usable generalizations from raw data. However, the need to find useful generalizations from experience has been the central goal of learning and adaptation mechanisms in biological organisms from the beginning. Further, as we shall argue, biological evolution is a similar process that succeeds in making effective changes based on the collective experience of many individuals, here over many generations rather than a single life. We shall review the commonality between learning and evolution in terms of quantitative computational theories. Intelligence, as colloquially understood, is also clearly related to learning. However, attempts to capture this notion by quantitative computational theories, have not succeeded to date. We shall discuss the possible reasons and consequences.
Abstract: The modern age is undoubtedly the age of data. Technological inventions have made the process of sensing, acquiring, and storing data easy to an extent that we have “too much of data”. However, not all of this data is precise — data imprecision and uncertainty is prevalent with huge amounts of data, geometric or otherwise. In this talk I will focus on computing some basic geometric primitives when the data is uncertain, and modeled in the existential model — each point or object in question is known to be active or only exists with a certain probability p. I will talk about over some recent work to compute probabilities of separability, containment and evasion, and expected range-max queries, when points have some associated values. While many such problems are computationally hard, we will see some algorithmic techniques in low dimensions, and some that extend to any constant dimension, to compute probabilities and expected values.

Top

Title: Software with Secrets
Speaker: Amit Sahai
Time & Venue: Fri 03:00PM – 04:00PM Lindley Hall, Rm. 102

Abstract: The goal of secure program obfuscation is to make a program “unintelligible” while preserving its functionality. For decades, program obfuscation for general programs has remained an art, with all public general-purpose obfuscation methods known to be broken.
In this talk, we will describe new developments that for the first time provide a mathematical approach to the problem of general-purpose program obfuscation, where extracting secrets from the obfuscated program requires solving mathematical problems that currently take hundreds of years to solve on the world’s fastest computing systems. We will also discuss the implications of these developments.

Top

Title: The decidability of linear logic and its fragments
Speaker: Katalin Bimbo
Time & Venue: Feb. 19, Friday 12:45pm – 1:45pm in LH101

Abstract: Linear logic was introduced by J.-Y. Girard in 1987. This talk surveys the current status of the problem of decidability for fragments and versions of propositional linear logic. Two interesting results detailed in the talk are the decidability of MELL (multiplicative-exponential linear logic) and the decidability of LL (full classical propositional linear logic) from 2014. I will also discuss the well-known paper by Lincoln, Mitchell, Scedrov and Shankar (1992), in which the authors claimed that LL is undecidable.

Some of the results mentioned in this talk are products of joint research with J. Michael Dunn.

Top

Title: Linear sketching using parities
Speaker: Grigory Yaroslavtsev
Time: Aug. 31, 12:00pm – 1:30pm
Venue: LH101

Abstract: “In the recent years linear sketching has emerged as a powerful tool for approximate computing in settings with limited resources including distributed computation and streaming. It has been used in breakthrough results on graph and matrix processing, dimensionality reduction, etc. Strikingly, linear sketches have been shown to be optimal for dynamic stream processing under fairly mild assumptions.

In this talk I will describe a new study of linear sketching that focuses on understanding the power of linear sketches based on parities (i.e. over GF_2, the field over two elements, as compared to the previous work that uses real arithmetic). I will illustrate various properties of such sketches using Fourier-analytic methods and tools from communication complexity. In particular,
linear sketching over GF_2 turns out to be closely related to Fourier sparsity with respect to Lp-norms. Moreover, it can be shown to be (almost) optimal in streaming and distributed settings for data generated according to the uniform distribution.

Joint work with Sampath Kannan (UPenn) and Elchanan Mossel (MIT).

Top

Title: Edit Distance: Sketching, Streaming and Document Exchange
Speaker: Qin Zhang
Time: 12:00pm – 1:30pm
Venue: LH101

Abstract: We show that in the document exchange problem, where Alice holds $x \in \{0,1\}^n$ and Bob holds $y \in \{0,1\}^n$, Alice can send Bob a message of size $O(K(\log^2 K+\log n))$ bits such that Bob can recover $x$ using the message and his input $y$ if theedit distance between $x$ and $y$ is no more than $K$, and output “error” otherwise. Both the encoding and decoding can be done in time $\tilde{O}(n+\poly(K))$. This result significantly improves the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold $x$ and $y$ respectively, they can compute sketches of $x$ and $y$ of sizes $\poly(K \log n)$ bits (the encoding), and send to the referee, who can then compute the edit distancebetween $x$ and $y$ together with all the edit operations if the edit distance is no more than $K$, and output “error” otherwise (the decoding). To the best of our knowledge, this is the {\em first} result for sketching edit distance using $\poly(K \log n)$ bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the {\em first} streaming algorithm for computing edit distance and all the edits exactly using $\poly(K \log n)$ bits of space.

Joint work with Djamal Belazzougui (CERIST, Algeria).

Top

Title: The Quantum Computer Puzzle
Speaker: Gil Kalai
Time: 4:00PM – 5:00PM
Venue: SE 140

Quantum computers are hypothetical devices, based on quantum physics, which would enable us to perform certain computations hundreds of orders of magnitude faster than digital computers. This feature is coined “quantum supremacy,” and one aspect or another of quantum supremacy might be seen by experiments in the near future: by implementing quantum error-correction or by systems of free bosons or by exotic new phases of matter called anyons or by quantum annealing, or in various other ways. We start the lecture with a gentle introduction to computing – classical and quantum, with basic notions of computational complexity, and with the vision of quantum computers.
A main reason for concern regarding the feasibility of quantum computers is that quantum systems are inherently noisy. We will explain what is “noise” and describe an optimistic hypothesis regarding quantum noise that will allow quantum computing and a pessimistic hypothesis that won’t. The remarkable progress witnessed during the past two decades in the field of experimental physics of controlled quantum systems places the decision between the pessimistic and optimistic hypotheses within reach.

In the lecture I will explain my pessimistic line of research and here is a brief summary of my view: understanding quantum computers in the presence of noise requires consideration of behavior at different scales. In the small scale, standard models of noise from the mid-90s are suitable, and quantum evolutions and states described by them manifest a very low-level computational power. This small-scale behavior has far-reaching consequences for the behavior of noisy quantum systems at larger scales. On the one hand, it does not allow reaching the starting points for quantum fault tolerance and quantum supremacy, making them both impossible at all scales. On the other hand, it leads to novel implicit ways for modeling noise at larger scales and to various predictions on the behavior of noisy quantum systems.

The expected behavior of noisy quantum systems in the small scale rely on the theory of noise-sensitivity and noise-stability developed with Benjamini and Schramm in the late 90s and on recent work with Guy Kindler related to the mysterious gap between permanents and determinants (or, in other words, between bosons and fermions). This theory will be developed and discussed in the second lecture.

In the third lecture we will draw consequences from the study of noise-stability and noise-sensitivity for the behavior of noisy quantum systems in the small scale, and consequences of the failure of quantum fault-tolerance and quantum supremacy for modeling and understanding noisy quantum systems on all scales.

Top

Title: Optimal Sparse Designs for Process Flexibility
Speaker: Yuan Zhou
Time: 12:00pm – 1:30pm
Venue: LH101

Abstract: We study the problem of how to design a sparse flexible process structure in a balanced and symmetrical production system to match supply with random demand more effectively. Our goal is to provide a sparsest design to achieve (1-e)-optimality relative to the fully flexible system. In a system with n plants and n products, Chou et al. proved that there exists a graph expander with O(n/e) arcs to achieve (1-e)-optimality for every demand realization. In this paper, we introduce a new concept called probabilistic graph expanders. We prove that a probabilistic expander with O(n ln(1/e)) arcs guarantees (1-e)-optimality with high probability (w.h.p.). On the lower bound side, we show any structure needs O(n ln(1/e)) arcs to achieve (1-e)-optimality w.h.p. We also discuss how to extend our optimal design to general production systems with n plants, m products, and non-identically distributed supplies and demands.

Top

Title: Fast and Space-Optimal Differentially-Private Low-Rank Factorization in the General Turnstile Update Model
Time: 12:00pm – 1:30pm
Venue: LH101

Abstract: The problem of low-rank factorization of an mxn matrix A requires outputting a singular value decomposition: an m x k matrix U, an n x k matrix V, and a k x k diagonal matrix D) such that U D V^T approximates the matrix A in the Frobenius norm. In this paper, we study releasing differentially-private low-rank factorization of a matrix in the general turnstile update model. We give two differentially-private algorithms instantiated with respect to two levels of privacy. Both of our privacy levels are stronger than privacy levels for this and related problems studied in previous works, namely that of Blocki et al. (FOCS 2012), Dwork et al. (STOC 2014), Hardt and Roth (STOC 2012, STOC 2013), and Hardt and Price (NIPS 2014). Our main contributions are as follows.
1. In our first level of privacy, we consider two matrices A and A’ as neighboring if A – A’ can be represented as an outer product of two unit vectors. Our private algorithm with respect to this privacy level incurs optimal additive error. We also prove a lower bound that shows that the space required by this algorithm is optimal up to a logarithmic factor.
2. In our second level of privacy, we consider two matrices as neighboring if their difference has the Frobenius norm at most 1. Our private algorithm with respect to this privacy level is computationally more efficient than our first algorithm and incurs optimal additive error.

Short bio: Jalaj Upadhyay obtained his PhD from University of Waterloo under the supervision of Douglas R. Stinson. He is currently a post-doctoral fellow working with Sofya Raskhodnikova and Adam Smith.

Top

Title: Applications of model theory to lower bounds
Time: 12:00pm – 1:30pm
Venue: LH101

Abstract: We look at computability over abstract (data) structures, and argue that the model theory of those structures can have important consequences for computability and lower bounds. Specifically, we look at type-counting lower bounds, where by “types” we mean as used in model theory. We apply these to get separation results between “recursive programs” and “iterative” or “tail recursive programs” and, at the end, state a general hypothesis about computation over abstract structures.

Top

Title: Complexity of Computation on Algebraic Data
Speaker: Elena Grigorescu
Time: 3:00PM – 4:00PM
Venue: LH102

Abstract: Error-correcting codes and point lattices are central primitives in modern communication, storage and cryptographic systems. Their common algebraic structure plays a role both in their applicability across a diverse range of domains, and in the availability of mathematical tools for analyzing specific computational tasks.

In this talk I will discuss a few core computational problems on codes and lattices, and their current status in global and local models. In the global model, I will focus on a recent result on decoding Reed-Solomon codes. In the local model, I will describe recent sublinear models of computation on codes and lattices. Finally, I will highlight some fundamental open problems.

Top

Title: Embeddings of the Heisenberg Group and the Sparsest Cut Problem.
Speaker: Robert Young
Time: 4:00PM – 5:00PM
Venue: SE140

Abstract: (joint work with Assaf Naor) The Heisenberg group H is a sub-Riemannian manifold that is hard to embed in Rn. Cheeger and Kleiner introduced a new notion of differentiation that they used to show that it does not embed nicely into L1. This notion is based on surfaces in H, and in this talk, we will describe new techniques that let us quantify the “roughness” of such surfaces, find sharp bounds on the distortion of embeddings of H, and estimate the accuracy of an approximate algorithm for the Sparsest Cut Problem.

Top

Title: Streaming for Aibohphobes: Longest Near-Palindrome under Hamming Distance.
Speaker: Samson Zhou
Time: 12:00PM – 1:30PM
Venue: LH101

Abstract: (Joint work with Elena Grigorescu and Erfan Sadeqi Azer) Palindromes are strings which read the same backwards and forwards, such as “racecar”. Given a metric and an integer d>0, a d-near-palindrome is a string of distance at most d from its reverse. This talk will cover the problem of identifying the longest d-near-palindrome in data streams under the Hamming distance. The problem is relevant to the analysis of DNA databases, and to the task of repairing recursive structures in documents such as XML and JSON.

We present the first streaming algorithm for the longest d-near-palindrome problem, which returns a d-near-palindrome whose length is within a multiplicative (1+c)-factor of the longest d-near-palindrome, for “small” d. We present additional streaming algorithms which return a d-near-palindrome whose length is within an additive E*sqrt(n)-factor of the longest d-near-palindrome in one-pass, as well as the longest length d-near-palindrome in two passes. Finally, we provide lower bounds for the space necessary to approximate the longest d-near-palindrome.

Top

Title: Real Stable Polynomials, Strongly Rayleigh distributions and Algorithmic Applications
Speaker: Shayan Oveis Gharan
Time: 4:00PM-5:00PM
Venue: Swain East 140

Abstract: A multivariate polynomial p(z1,…,zn) is stable if p(z1,…,zn)≠0 whenever ℑ(zi)>0 for all i. Strongly Rayleigh distributions are probability distributions on 0-1 random variables whose generating polynomial is stable. They can be seen as a natural generalization of determinantal distributions. Borcea, Branden and Liggett used the geometry of stable polynomials to prove numerous properties of strongly Rayleigh distributions, including negative association, and closure under conditioning and truncation.
In this talk I will go over basic properties of stable polynomials and strongly Rayleigh distributions; then, I will describe algorithmic applications in counting, sampling and optimization.

Based on joint works with Nima Anari, Alireza Rezaei, Amin Saberi, Mohit Singh.

Top

Title: Multisets and a ternary relation
Speaker: Shayan Oveis Gharan
Time: 12:00PM-1:30PM
Venue: LH101

Abstract: Multisets are a useful data type which can be located between sets and lists. Finite multisets over a denumerable base set can be represented by positive integers. In this talk, I will show how insights concerning multisets and the relevant divisibility relation (from the semantics of R) can be applied to the decidability of linear logic.

Linear logic was claimed to be undecidable in Lincoln et al. (1992) and Kanovich (1995). It was first proved decidable in KB & J. M. Dunn (2015) using sequent calculi.

Top

Title: Introduction to Discrete Quantum Theories and Computing
Speaker: Yu-Tsung Tai
Time: 12:00PM-1:30PM
Venue: LH101

Abstract: Quantum computing is fascinating because it provides a possibility to compute faster than a Turing machine. For example, Shor found a polynomial-time algorithm on a Quantum Computer for prime factorization which has no efficient classical algorithm so far. However, most mathematical quantum computing models depend on uncomputable numbers, that is, the continuum of real numbers, despite the idea of computable numbers is of foundational significance in computer science.
In this talk, we will review the conventional quantum computing model and the basic quantum algorithms, including Deutsch’s algorithm and Deutsch-Jozsa algorithm. We will then introduce the quantum theory and computing based on finite fields. Although it can express deterministic algorithms, it is too weak to implement any probabilistic quantum algorithms. This quantum theory is, however, also too powerful that it can be used to solve UNIQUE-SAT which is as hard as the general satisfiability problem.

BIO: Yu-Tsung Tai is a Ph.D. candidate in Mathematics and Computer Science at Indiana University, Bloomington. His research interests is the mathematical foundation of quantum computing, and he expects to receive his Ph.D. degree in May 2018.

Top

Speaker: Jiecao Chen
Time: 12:00PM-1:30PM
Venue: LH101

Abstract: This talk will cover the topic of finding top-K arms in a multi-armed bandit game. In a multi-armed bandit game, we assume that there are n alternative arms, where the i-th arm is associated with an unknown reward distribution D_i with mean θ_i. The agent sequentially pulls an arm, and upon each pulling of the i-th arm, the i.i.d. reward from D_i is observed. The goal is to find K arms with large means. Solutions to this problem can be found applications in Crowdsourcing, A/B/C testing etc. We will discuss this problem under several different settings and we focus on adaptive solutions.

Part of this talk is based on our ICML’17 paper (joint work with Xi Chen, Qin Zhang, Yuan Zhou).

Top

Title: Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem
Time: 12:00PM-1:30PM
Venue: LH101

Abstract: We study the maximum k-set coverage problem in the following distributed setting. A collection of input sets S1, …, Sm over a universe [n] is partitioned across p machines and the goal is to find k sets whose union covers the most number of elements. The computation proceeds in rounds. In each round, all machines simultaneously send a message to a central coordinator who then communicates back to all machines a summary to guide the computation for the next round. At the end of the last round, the coordinator outputs the answer. The main measures of efficiency in this setting are the approximation ratio of the returned solution, the communication cost of each machine, and the number of rounds of computation.
Our main result is an asymptotically tight bound on the tradeoff between these three measures for the distributed maximum coverage problem: any r-round protocol for this problem either incurs a communication cost of m^{Omega(1/r)} or only achieves an approximation factor of k^{Omega(1/r)}; both these factors are also tight. We further use our results in this distributed setting to obtain new bounds for the maximum coverage problem in two other main models of computation for massive datasets, namely, the dynamic streaming model and the MapReduce model.

Based on joint work with Sanjeev Khanna (to appear in SODA’18)

Top

Title: Round Compression for Parallel Matching Algorithms
Speaker: Krzysztof Onak
Time: 12:00PM-1:30PM
Venue: LH101

Abstract: The last decade has witnessed the success of massive parallel systems such as MapReduce, Hadoop, Dryad, or Spark. Compared to the classic models of parallel and distributed computation, these systems allow for more local computation on a larger amount of local data in each parallel round. The fundamental question that arises in this context is: can we leverage this additional power to obtain algorithms that use fewer parallel rounds?

In this context, we consider one of the most classic graph problems: maximum matching. It has been known that if the amount of memory per machine is at least n^(1+Omega(1)), where n is the number of vertices, then a constant number of rounds suffices to compute a 2-approximation via a maximal matching. If the memory is near-linear or sublinear in n, nothing substantially better than simulating the classic PRAM or distributed algorithms for maximal matching has been known. For memory linear (or even slightly sublinear) in n, we show the first algorithm that runs in poly(log log n) rounds and computes a near-maximal matching.

In order to establish the result, we introduce a technique of round compression. It allows us to execute a superconstant number of rounds of a distributed algorithm in a constant number of rounds of a massive parallel system. A crucial technical insight here is the use of vertex-based partitioning instead of edge-based partitioning methods of earlier works. Our result and techniques have already inspired follow-up research.

Joint work with Artur Czumaj, Jakub Łącki, Aleksander Mądry, Slobodan Mitrović, and Piotr Sankowski.​