probabilistic pragmatics & rational analysis

\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{green}{RGB}{107,142,35} \newcommand{\green}[1]{{\color{green}{#1}}} \] \[ \definecolor{blue}{RGB}{0,0,205} \newcommand{\blue}[1]{{\color{blue}{#1}}} \] \[ \newcommand{\den}[1]{[\![#1]\!]} \] \[ \newcommand{\set}[1]{\{#1\}} \]

overview

what is probabilistic pragmatics?

example: vanilla rational speech act model

extensions & applications:
- individual differences
- embedded scalars

probabilistic pragmatics

what is probabilistic pragmatics?

definitely one thing: a piece of bad terminology
- probabilities play a role, but they are not the main thing

blunt umbrella term for a set of approaches unified by family resemblance
- game theoretic pragmatics, "Bayesian pragmatics"
- …
- modern post-Grice (Geurts, Lauer, Optimality or Relevance Theory)
- …

gradient concept characterized by a property cluster

Franke & Jäger (2016), Probabilistic pragmatics, ZfS 35(1):3-44

key properties

probabilistic
- language users are usually uncertain about relevant contextual elements
- probabilities are a good tool to capture uncertainty
interactive
- pragmatics is not only about readings of sentences
- explicitly consider speaker and listener perspectives
rationalistic
- language use as goal-oriented, purposeful behavior
computational
- formally precise, implementable (likely: quantitative) formulations
data-oriented
- pragmatic theory feeds full data-generating model for experimental data

\(\Rightarrow\) Bayesian as a consequence of particular implementations of 1-3

levels of analysis

LoA

rational analysis

A rational analysis is an explanation of an aspect of human behavior based
on the assumption that it is optimized somehow to the structure of the 
environment. ... [T]he term does not imply any actual logical deduction in 
choosing optimal behavior, only that the behavior will be optimized. 
                                                    (Anderson 1991, p. 471)

example: reference game paradigm

speaker and listener observe a fixed set \(T\) of referents, e.g.:
speaker knows which referent \(t \in T\) she wants to talk about
speaker can choose a message \(m\) from set \(M = \{ \text{blue}, \text{green}, \text{square}, \text{circle} \}\)
listener tries to recover intended referent based on message
communication is successful if guess matches intended referent

[signaling game!]

rational speech act model

dummy

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]) \]

dummy

Gricean speaker approximates informativity-maximization (with parameter \(\lambda\)):

\[ P_{S}(m \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

pragmatic listener uses Bayes' rule to infer likely world states:

\[ P_L(t \mid m \, ; \, \lambda) \propto P(t) \cdot P_S(m \mid t \, ; \, \lambda) \]

(c.f., Benz 2006, Frank & Goodman 2012)

individual variation

simple & complex reference games

RSAstim

Franke & Degen (2016), Reasoning in reference games, PLoS one 11(5)

complexity of reasoning

chains

individual variation in reasoning depth

formulas

typePredictions

(c.f., Camerer 2006, Franke 2011, Jäger 2014)

data

60 subjects each for production & comprehension
12 trials of each critical condition per subject
production:
comprehension:

population-level modeling

maximum-likelihood fit of RSA model to population-level data
correlation \(r = 0.997\), \(p < 0.0001\)

restults1

data-generating model

modelgraph

results

bysubject

visual impression corroborated by Bayesian model comparison:
- RSA-style Gricean-speaker only is tenable
- RSA-style Gricean-listener only is not

upshot

how "rational" and "interactive" subjects are is to be settled by data
RSA-style pragmatics blends into statistical data-generating models

embedded scalars

a Quantity conflict

war

Quantity implicature: example

utterance: "I own some of Johnny Cash's albums."    
---implicates--->  I don't own them all.

dummy

godzilla

Traditionalism

reason why S didn't say "all" is because it's not true

Grammaticalism

parse the sentence as:

I own O(some) of JC's albums.

where:

\[ O(x) = x \bigcap_{y \in ALT^+(x)} \neg y \]

alternative

rational probabilistic reasoning about putative meaning enrichments

(joined work in progress with Leon Bergen)

nested Aristotelians

toy language

9 possible sentences

[none | some | all] of the monsters drank [none | some | all] of their water

dummy

7 possible worlds

100 0balls

010 0balls

110 0balls

011

101 0balls

001 0balls

111 0balls

semantics

##     NN NS NA SN SS SA AN AS AA
## 100  0  1  1  1  0  0  1  0  0
## 110  0  0  1  1  1  0  0  0  0
## 101  0  0  0  1  1  1  0  0  0
## 111  0  0  0  1  1  1  0  0  0
## 010  1  0  1  0  1  0  0  1  0
## 011  1  0  0  0  1  1  0  1  0
## 001  1  0  0  0  1  1  0  1  1

parses

000:       [ N | S | A]  of the ... drank   [ N | S | A]  of the ...
001:       [ N | S | A]  of the ... drank O([ N | S | A]) of the ...
010:     O([ N | S | A]) of the ... drank   [ N | S | A]  of the ...
011:     O([ N | S | A]) of the ... drank O([ N | S | A]) of the ...
100:  O(   [ N | S | A]  of the ... drank   [ N | S | A]  of the ...)
101:  O(   [ N | S | A]  of the ... drank O([ N | S | A]) of the ...)
110:  O( O([ N | S | A]) of the ... drank   [ N | S | A]  of the ...)
111:  O( O([ N | S | A]) of the ... drank O([ N | S | A]) of the ...)

lexical exhaustification

O(N) = N        O(A) = A        O(S) = "some but not all"

sentential exhausticifation

\[ O(x) = x \bigcap_{y \in ALT^+(x)} \neg y \ \ \ , \text{if consistent} \]

where \(ALT^+(x)\) contains all sentences that are stronger than \(x\) obtainable by replacing N,S, or A for one another

possible readings

   sentence parse s100 s110 s101 s111 s010 s011 s001
       NN  p000    0    0    0    0    1    1    1
       NN  p100    0    0    0    0    1    1    0       
       
       NS  p000    1    0    0    0    0    0    0
       NS  p001    1    0    1    0    0    0    1
       NS  p101    0    0    1    0    0    0    1
       
      xNA  p000    1    1    0    0    1    0    0
      xNA  p100    0    1    0    0    1    0    0
      
       SN  p000    1    1    1    1    0    0    0
       SN  p010    0    1    1    1    0    0    0
       
       SS  p000    0    1    1    1    1    1    1
       SS  p001    0    1    0    1    1    1    0
       SS  p010    0    1    1    1    0    0    0
       SS  p011    0    1    0    1    0    1    0
       SS  p100    0    1    0    0    0    0    0
       
       SA  p000    0    0    1    1    0    1    1
       SA  p010    0    0    1    1    0    1    0
       
       AS  p000    0    0    0    0    1    1    1
       AS  p100    0    0    0    0    1    1    0
       AS  p001    0    0    0    0    1    0    0

disambiguation

dummy

godzilla

Traditionalism

only "parses"" 000 and 100
contextual selection given:
- speaker knowledge
- contextual relevance

Grammaticalism

all parses possible
some unspecified syntactic disambiguation mechanism

a bunch of models

Rational Speech Act model

dummy

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]) \]

dummy

Gricean speaker approximates informativity-maximization (with parameter \(\lambda\)):

\[ P_{S}(m \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

pragmatic listener uses Bayes' rule to infer likely world states:

\[ P_L(t \mid m \, ; \, \lambda) \propto P(t) \cdot P_S(m \mid t \, ; \, \lambda) \]

(c.f., Benz 2006, Frank & Goodman 2012)

lexical uncertainty

idea: reasoning over lexicon \(\red{l}\) of the speaker

dummy

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]^{\red{l}}) \]

dummy

\[ P_{S}(m \mid t \, , \red{l} \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

\[ P_L(t, \red{l} \mid m \, ; \, \lambda) \propto P(t) \cdot P(\red{l}) \cdot P_{S}(m \mid t \, , \red{l} \, ; \, \lambda) \]

dummy

l in {p000, p011}

(c.f., Bergen et al. to appear, Potts et al. 2015)

speaker-intended lexical narrowing

idea: speaker can choose to intend a lexical narrowing \(\red{l}\)

dummy

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]^{\red{l}}) \]

dummy

\[ P_{S}(m, \red{l} \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

\[ P_L(t, \red{l} \mid m \, ; \, \lambda) \propto P(t) \cdot P(\red{l}) \cdot P_{S}(m, \red{l} \mid t \, ; \, \lambda) \]

dummy

l in {p000, p001, p010, p011}

speaker-intended parses

idea: speaker can choose to intend a syntactic parse \(\red{p}\)

dummy

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]^{\red{p}}) \]

dummy

\[ P_{S}(m, \red{p} \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

\[ P_L(t, \red{p} \mid m \, ; \, \lambda) \propto P(t) \cdot P(\red{p}) \cdot P_{S}(m, \red{p} \mid t \, ; \, \lambda) \]

dummy

p in {p000, p001, p010, p011, p100, p101, p110, p111}

disambiguation by logical strength

idea: weigted parses introduce graded semantics

dummy

\[ P_S(m \mid t \, ; \, \lambda) = \sum_{p \in \mathcal{P}} \exp(\lambda \cdot w(m,p)) \cdot \delta_{t \in \den{m}^p} \]

dummy

\(w(m,p))\) is the rank of \(\den{m}^p\) in the ordering of \(\set{\den{m}^p \mid p \in \mathcal{P}}\) by logical strength

experiment

design & participants

80 participants via MTurk
two parts:
1. production
2. comprehension

productionScreenshot

production

productionScreenshot

comprehension

productionScreenshot

results: production

results comprehension

      100  110  101  111  010  011  001
 NN   0.26 0.09 0.06 0.05 0.20 0.15 0.19
 NS   0.53 0.07 0.12 0.05 0.05 0.04 0.14
xNA   0.29 0.25 0.03 0.04 0.30 0.04 0.05
 SN   0.15 0.25 0.26 0.24 0.03 0.04 0.03
 SS   0.02 0.23 0.11 0.21 0.14 0.22 0.06
 SA   0.03 0.04 0.26 0.25 0.03 0.25 0.14
 AN   0.69 0.07 0.05 0.04 0.05 0.03 0.06
 AS   0.03 0.04 0.03 0.05 0.41 0.27 0.17
 AA   0.07 0.03 0.05 0.06 0.04 0.05 0.71

model comparison

model comparison by BIC

derived approximate Bayes factors

           lex_int  exh      SM       lex_unc  rsa
lex_int    1        -        -        -        -
exh        8.89     1        -        -        -
StrongMean 71       7.98     1        -        -
lex_unc    239      269      3.37     1        -
rsa        6.63e+09 7.45e+08 9.33e+07 2.76e+07 1

preferred parses

parse-choice model: \(P_L(p \mid m ; \hat{\lambda})\)

     p000  p001  p010  p011  p100  p101  p110  p111
 NN  0.131 0.131 0.131 0.131 0.119 0.119 0.119 0.119
 NS  0.105 0.158 0.105 0.158 0.105 0.132 0.105 0.132
xNA  0.133 0.133 0.133 0.133 0.117 0.117 0.117 0.117
 SN  0.136 0.136 0.121 0.121 0.121 0.121 0.121 0.121
 SS  0.159 0.139 0.118 0.121 0.085 0.139 0.118 0.121
 SA  0.134 0.134 0.122 0.122 0.122 0.122 0.122 0.122
 AN  0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
 AS  0.151 0.106 0.151 0.106 0.136 0.106 0.136 0.106
 AA  0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

weights in strongest meaning model:

strongest   2nd     3rd     4th
0.362       0.274   0.207   0.156

upshot

probabilistic pragmatics can complement "traditional" analyses
model comparison is tantamount

conclusions

dummy

probabilistic pragmatics connects:
- linguistic theory,
- statistical analyses, and
- experimental data
individuated by cluster concept:
- probabilistic
- interactive
- rationalistic
- computational
- data-oriented

dummy