Sampling cannot make hard work light

Robert Long recently wrote a blog: Five ways the mind does not solve computationally intractable problems (and one way it does). It summarises a paper that I co-wrote. To be frank, I was a bit jealous of the clarity and conciseness with which Robert laid out the main arguments (it took us 20 pages!).

I recommend reading Robert’s blog before reading on (perhaps also have a peek at this thread). I won’t reiterate all points here. I’ll be focussing instead on Elizabeth Bonawitz’s kind invitation to give my take on her and others’ hypothesis that minds use sampling as a method for approximating intractable Bayesian inference. For highlights of Elizabeth and co-authors’ paper see this thread.

There are several ways in which one can understand the claim that sampling provides a way of approximating intractable Bayesian inference. One way of understanding the word “approximation” is that it is an inexact method. In theoretical computer science an inexact algorithm is also called a heuristic. Heuristics can perform well on some instances of a problem, but can be arbitrarily off on other instances. In contrast, the word “approximation” is typically reserved for approximation algorithms that make (and keep) some promise on the quality of the estimate.1

Cognitive scientists regularly appeal to heuristics or approximation algorithms (though seldom distinguished) as algorithmic level strategies for fencing off the charge of intractability of computational-level theories. Neither strategy is unproblematic, however.

On the one hand, heuristics cause algorithmic and computational level theories to be arbitrarily predictively different for infinitely many, unidentified inputs to the problem (van Rooij et al., 2012). On the other hand, approximation algorithms are themselves generally intractable (Kwisthout and van Rooij, 2011; van Rooij and Wareham, 2012).

I would be naive not to anticipate a variety of objections to the above. In fact, over the years I have invested quite some energy in rebutting, or otherwise qualifying, such objections (van Rooij, 2008; van Rooij et al., 2012, 2018). For instance, in our book Cognition and Intractability we rebut 14 of the most common objections to the idea that tractability constrains computational-level theories:2

  1. The Empiricist Objection
  2. The Cognition-Is-Not-Computation Objection
  3. The Super-Human Objection
  4. The Heuristics Objection
  5. The Average-Case Objection
  6. The Parallelism Objection
  7. The Non-Determinism Objection
  8. The Quantum Computing Objection
  9. The Small-Inputs Objection
  10. The Nothing-New Objection
  11. The Too-Liberal Objection
  12. The Turing-Machine Objection
  13. The Approximation Objection
  14. The As-If Objection

To the extent that the ‘sampling hypothesis’ can be understood as a proposal for how minds approximate at the algorithmic level Bayesian inferences that are intractable at the computational level, the hypothesis seems to span (4), (7), (13) and (14) in the list above. Points (4) and (13) I covered above, and (14) was covered in Robert’s blog. So let’s look a bit more in depth at (7).3

A sampling algorithm is a non-deterministic (probabilistic or randomised) algorithm in the sense that its computation paths and outputs can be variable across runs of the algorithm on one and the same input.4 As Bonawitz et al. (2014) note, sampling algorithms can be used to estimate the shape or mode, or otherwise cognitively useful properties, of posterior distributions (e.g., the mode could specify the most probable hypothesis given the data). If properly configured, and given a sufficient number of samples, the output of sampling algorithms can approximate the to be estimated value to an arbitrary degree.5

For some inference problems a small number of samples already suffices to give good estimates (cf. Vul et al., 2014). I assume that this observation, combined with the widespread idea that intractability is an artefact of deterministic and exact computation, has fueled the intuition that sampling is a sound algorithmic level way of dealing with computational-level intractability.

Leaving aside that neither determinism nor exactness are causes of intractability,6 the sampling hypothesis overlooks an important caveat: If a problem is intractable to compute deterministically (formally, NP-hard) then probabilistically approximating it within reasonable bounds requires an exponential number of samples.7 This means that if the approximation quality of sampling algorithms is good (enough), then the Bayesian inference problem that the sampling algorithms are solving must be tractable.

In other words, demonstrations in the cognitive science literature that sampling algorithms can approximate a Bayesian inference problem well, using a small or otherwise not unreasonably large number of samples, ipso facto demonstrates that the inference problem being approximated is a tractable problem. This tractable problem may still be a form of Bayesian inference, but since we know that for unrestricted domains Bayesian inference can be intractable it must then be Bayesian inference for a restricted domain.

Human rational behavior (…) is shaped by a scissors whose two blades are the structure of task environments and the computational capabilities of the actor.

– Herbert Simon (1990)

Now domain restriction need not make the Bayesian inference trivial. On the contrary, the domain may still contain infinitely many possible inputs, just not the ones that cause trouble for the sampling algorithm. There are complexity-theoretic techniques for identifying the parameters that partition the input domain into ‘hard’ versus ‘easy’ inputs,8  with natural extensions to sampling-based approximation (Kwisthout, 2018).

If one truly believes that sampling algorithms provides a veridical algorithmic level account of how people make probabilistic inferences then identifying the “normal” or “ecologically relevant” conditions that render our sampling-based inferences tractable would seem an important research effort.

Notably, Bayesian modellers and theorists do consider domain assumptions to explain properties of human inference, e.g., by showing that relative to those domain assumptions the behaviour is (approximately) rational (e.g. Chater & Oaksford, 1994, Navarro & Perfors, 2011, Vul et al. 2014). Yet, what seems so far largely overlooked is that similar assumptions are needed to explain how that (approximate) rationality can be tractable (but see e.g. Blokpoel et al., 2013)

What I am (not) saying

To prevent any misunderstanding, let me end by explicating what I am and am not saying.

I am not saying that the ‘sampling hypothesis’ is wrong as an algorithmic level account of how human minds make probabilistic inferences. The hypothesis is not implausible. For what it’s worth, personally I find the hypothesis a priori plausible. Moreover, it seems to have some substantive empirical support.

I am also not saying that Bayesian inference is an incorrect or non-veridical computational-level characterisation of problems solved by minds. Again, the hypothesis is not implausible and it has substantive empirical support. Moreover it has explanatory power not (obviously) shared by non-rational accounts of cognition.

What I am saying is that intractability of Bayesian inference at the computational level cannot be overcome by ‘sampling as approximation’ itself. It must be paired with additional assumptions on the input domain of the inference problem. These additional assumptions constrain the nature of the problem solved, and thereby the computational level characterisation.

As Herbert Simon argued, the mind’s heuristics and its environment must fit together like two blades of a scissors; so too must sampling algorithms fit their particular domain of application to work well. It is by studying the two blades in tandem that we may understand how resource-bounded minds can tractably cope with the complexity and uncertainty of their life world.


Footnotes

1   Crucially, approximation notions typically pursued in computer science do not always make sense for cognitive science. For instance, value-approximation is commonly pursued in the former, but structure-approximation typically makes more sense in the latter, and the two notions are fully dissociable (van Rooij & Wareham, 2012).

2   This list is an updated and extended version of the 11 objections that I covered in The Tractable Cognition thesis (van Rooij, 2008).

3   Although the Average-Case Objection (point (5) in the list), may appear to apply as well, this appearance is illusory. The Average-Case Objection is about average-case versus worst-case analysis, which pertains to the distribution of (hard) instances in the input domain of an intractable computational-level inference problem, not to the average performance of a probabilistic (i.e., non-deterministic) algorithm on given inputs in that domain. The latter will be covered in the text. I will return to the topic of domain restriction later on.

4   A word of caution: This intuitive notion of `nondeterministic’ does not match the formal notion of non-determinism in theoretical computer science which is better understood as “perfect guessing” with oracle powers (or equivalently, unbounded parallelism). In contrast to probabilistic computation, non-deterministic “computation” in the formal, computer science sense is not seen as a realistic notion of physically realisable computation at all.

5  I am not sure if I understand Johan Kwisthout’s comment here correctly, but it seems to be saying that even such convergence guarantees may require assumptions about the domain. Will have to check this with Johan after the break.

6  For instance, the Minimum Spanning Tree problem has a combinatorially complex search space larger even than the intractable Traveling Salesperson Problem, but it can nevertheless be tractably and exactly solved by a simple deterministic algorithm that even kids can figure out.

7  At least, assuming some widely believed conjectures in theoretical computer science (P ≠ NP and P = BPP) that aren’t as far as I know contested by any cognitive scientists.

8  This is bit of an oversimplification. At present, we only have techniques to partition into ‘for sure easy’ and ‘possibly hard’, with some false alarms in the latter set. Likely we cannot ever do better than this, unless we win a Millennium prize and become millionaires.