<TeXmacs|1.99.7>

<style|<tuple|article|cite-author-year|std-latex>>

<\body>
  <\hide-preamble>
    <assign|n|<macro|1|<with|math-font-series|bold|<math|<arg|1>>>>>

    <new-theorem|res|Result>

    <new-theorem|pro|Proposition>

    <new-theorem|lem|Lemma>

    <new-theorem|thm|Theorem>

    <new-theorem|defn|Definition>

    <assign|be|<macro|>>

    <assign|ee|<macro|>>

    <\assign|beq>
      <\macro|1>
        <label|1>
      </macro>
    </assign>

    <assign|eeq|<macro|>>

    <assign|ba|<macro|>>

    <assign|ea|<macro|>>

    <assign|*|<macro|<argmax>>>
  </hide-preamble>

  <doc-data|<doc-title|Inferences in Bayesian variable selection problems
  with large model spaces>|<doc-author|<author-data|<author-name|Garca-Donato,
  G<rsup|<math|1>> and Martnez-Beneito, MA<rsup|<math|2>><next-line><with|font-size|0.71|<rsup|<math|1>>
  Universidad de Castilla La Mancha, Spain, <rsup|<math|2>> CSISP Valencia,
  Spain>>>>|<doc-date|<date|>>>

  <abstract-data|<\abstract>
    An important aspect of Bayesian model selection is how to deal with huge
    model spaces, since exhaustive enumeration of all the models entertained
    is unfeasible and inferences have to be based on the very small
    proportion of models visited. This is the case for the variable selection
    problem, with a moderate to large number of possible explanatory
    variables being considered in this paper. We review some of the
    strategies proposed in the literature and argue that inferences based on
    empirical frequencies via Markov Chain Monte Carlo sampling of the
    posterior distribution outperforms recently proposed searching methods.
    We give a plausible yet very simple explanation of this effect, showing
    that estimators based on frequencies are unbiased. The results obtained
    in two illustrative examples provide strong evidence in favor of our
    arguments.<next-line>

    Keywords: Bayesian model selection, Searching strategies, g-priors
  </abstract>>

  <graphicspath|plots//>

  <section|Inferences in large model spaces><label|sec.est>

  This paper is rooted in the model selection problem, that is with
  uncertainty surrounding the probabilistic model which, from an initial set
  <math|\<cal-M\>> of candidates, better explains certain data <math|<n|y>>.
  In particular, we address the variable selection problem where the
  competing models differ about which subset of variables are to be included
  as explanatory covariates for <math|<n|y>>.

  One special characteristic of the variable selection problem is that
  <math|\<cal-M\>>, the model space, easily becomes extremely large. For
  instance, a problem with <math|p=40> potential covariates has
  <math|2<rsup|40>\<approx\>10<rsup|12>> different models. The mere binary
  representation of such a model space would occupy 5 terabytes of memory.

  We focus on the difficulties that arise as a consequence of the very large
  size of <math|\<cal-M\>>. We consider the problem from a Bayesian point of
  view and the context we use for the development of our ideas is the problem
  of variable selection in Gaussian regression models.

  The Bayesian approach to the problem is conceptually straightforward. Any
  feature of interest, say <math|\<tau\>>, is a deterministic function of the
  posterior distribution over the model space. Examples of such features are
  the highest posterior probability model (hereafter HPM), the inclusion
  probabilities of covariates or posterior predictions of a new value of the
  dependent variable. Unfortunately, three major difficulties arise when
  putting the Bayesian approach into practice: i) the choice of the prior
  distributions; ii) the computation of the integrated likelihood (or
  equivalently the Bayes factors) for single models in <math|\<cal-M\>> and,
  in large model spaces, iii) the <with|font-shape|italic|estimation> of
  <math|\<tau\>>, since its exact value is virtually unknown due to the size
  of <math|\<cal-M\>>. Benchmark papers for each of these areas of research
  are respectively, <cite-textual|BerPer01>, <cite|ChibJeliazkov01> and
  <cite|GeorgeMcCulloch97>.

  Our work is basically concerned with iii), which is intimately related with
  strategies for exploring the model space (i.e. visiting a small proportion
  of models, hereafter denoted <math|\<cal-M\><rsup|\<ast\>>>), since
  covering the whole model space is unfeasible. Our main aim is to shed new
  light on a topic (almost an implicit debate) that from time to time appears
  in the literature (see references below). The subject is about the
  estimation of <math|\<tau\>> and more concisely whether it should be based
  on the <with|font-shape|italic|empirical> distribution (observed
  frequencies in <math|\<cal-M\><rsup|\<ast\>>>) or on the normalized Bayes
  factors of models in <math|\<cal-M\><rsup|\<ast\>>>. In the first approach,
  until quite recently the general one, models in
  <math|\<cal-M\><rsup|\<ast\>>> are visited according to a sampling scheme
  with the posterior distribution of the models in <math|\<cal-M\>> as the
  stationary distribution. Markov Chain Monte Carlo methods are commonly used
  for this task. In the second approach the emphasis is placed on visiting,
  usually without replacement, good models (i.e. with high posterior
  probability). In the rest of the paper, for ease of comprehension, we
  respectively refer to <with|font-shape|italic|empirical> and
  <with|font-shape|italic|re-normalized> for each of the approaches outlined
  above. The common use of <with|font-shape|italic|empirical> methods is MCMC
  methods for sampling from the posterior distribution plus an estimation of
  <math|\<tau\>> via frequencies. On the other hand, the
  <with|font-shape|italic|re-normalized> approach uses algorithms for
  sampling good models and estimations which are obtained via the
  <with|font-shape|italic|re-normalized> analytical expression of Bayes
  factors. <with|font-shape|italic|Empirical> methods have been proposed and
  used by <cite|GeoMc93>, <cite|GeorgeMcCulloch97>, <cite|KuoMal98>,
  <cite|Deletal00>, <cite|NottKohn05>, <cite|Nt02>, <cite|Nt09> and
  <cite|CasMor06> (just to mention a few). Papers more in favor of the
  <with|font-shape|italic|re-normalized> approach are <cite|Clyetal10>,
  <cite|BerMol05>, <cite|CarSc09> and <cite|ScCar09>.

  <with|font-shape|italic|Re-normalized> methods are motivated by the sound
  argument that the frequency of visits, in such huge model spaces, is a poor
  basis for estimation since the number of repeated visits (if any) is very
  small. Several authors (see eg. <cite-raw|Clyetal10> and
  <cite-raw|ScCar09>) have argued in favor of the superiority of these
  procedures over the <with|font-shape|italic|empirical> ones. Nevertheless,
  as we further explain, our experience is quite the opposite, finding that
  in general <with|font-shape|italic|empirical> estimations outperform their
  <with|font-shape|italic|re-normalized> counterparts in key aspects. Our
  explanation for this effect is simple: <with|font-shape|italic|empirical>
  estimators are particular cases of <with|font-shape|italic|probability
  proportional to size sampling (PPS) estimators> (see <cite-raw|Lohr99>),
  and hence unbiased. Furthermore, these estimators have an associated
  measure of precision which can be very useful for the problem at hand. An
  appealing and well known extra property of
  <with|font-shape|italic|empirical> methods is that, although exploring high
  probability models is not their ultimate goal, these appear more frequently
  simply because they are more probable.

  Throughout this paper we develop and formalize the ideas outlined above.
  With this aim in mind, the paper has been organized in the following way.

  In Section<nbsp><reference|Zellner> we formulate the problem of the
  variable selection considered. In order to keep the impact of difficulties
  i) and ii) above under control, we use Zellner's <math|g>-priors
  <cite-parenthesized|Zellner86> which produce Bayes factors in a
  closed-form. The corresponding formulae are also briefly given in
  Section<nbsp><reference|Zellner>. In Section<nbsp><reference|sec.slm> the
  usual methods for obtaining <math|\<cal-M\><rsup|\<ast\>>> are reviewed. In
  Section<nbsp><reference|pss>, <with|font-shape|italic|empirical> estimators
  are expressed as PPS estimators and several properties are shown and
  discussed. In Section<nbsp><reference|sec.exact> we present the exact
  results of a moderately large problem (Ozone35) with <math|p=35> covariates
  obtained with parallel computing, programmed for the occasion with
  optimized C routines. We then compare these exact results with those
  obtained with <with|font-shape|italic|empirical> and
  <with|font-shape|italic|re-normalized> estimators, showing strong evidence
  in favor of the first ones. In Section<nbsp><reference|sec.larger> we
  analyze a much larger dataset (Ozone65) with 65 covariates (for which we do
  not have the exact answer). Finally, Section<nbsp><reference|sec.Ext>
  contains a summary of the main conclusions in this paper.

  <section|Bayesian variable selection><label|Zellner>

  Let <math|<n|X>=<around|{|x<rsub|i*j>|}>> be an <math|N\<times\>p> full
  rank matrix and <math|<n|\<gamma\>>=<around|(|\<gamma\><rsub|1>,\<ldots\>,\<gamma\><rsub|p>|)>>
  be a <math|p>-dimensional vector of binary variables. Denote
  <math|k<rsub|\<gamma\>>=<big|sum>\<gamma\><rsub|i>> and for each
  <math|<n|\<gamma\>>>, let <math|<n|X><rsub|\<gamma\>>> be the
  <math|N\<times\>k<rsub|\<gamma\>>> design matrix corresponding to the
  columns with ones in <math|<n|\<gamma\>>>.

  The variable selection problem we consider has <math|2<rsup|p>> competing
  models, each proposed as a plausible explanation of an <math|N> dimensional
  vector <math|<n|Y>>. More concisely

  <\equation>
    <label|TheProblem>M<rsub|\<gamma\>>:<n|Y>\<sim\>N<rsub|N>*<around|(|\<alpha\><n|1>+<n|X><rsub|\<gamma\>><n|\<beta\>><rsub|\<gamma\>>,\<sigma\><rsup|2><n|I>|)>,<space|.2cm><n|\<gamma\>>\<in\><around|{|0,1|}><rsup|p>.
  </equation>

  In this problem, the model space <math|\<cal-M\>> can be represented by
  <math|<around|{|0,1|}><rsup|p>>. The simplest model among the proposed ones
  is

  <\equation*>
    M<rsub|0>:<n|Y>\<sim\>N<rsub|N><around|(|\<alpha\><n|1>,\<sigma\><rsup|2><n|I>|)>.
  </equation*>

  Without loss of generality, posterior probabilities of the models can be
  expressed as

  <\equation>
    <label|postprob>P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>=C*<space|0.17em>B<rsub|\<gamma\>*0>*<space|0.17em>P*r<around|(|M<rsub|\<gamma\>>|)>,
  </equation>

  where <math|B<rsub|\<gamma\>*0>> is the Bayes factor of
  <math|M<rsub|\<gamma\>>> to <math|M<rsub|0>>,
  <math|P*r<around|(|M<rsub|\<gamma\>>|)>> is the prior probability of
  <math|M<rsub|\<gamma\>>> and

  <\equation*>
    C<rsup|-1>=<big|sum><rsub|\<gamma\>><space|0.17em>B<rsub|\<gamma\>*0>*<space|0.17em>P*r<around|(|M<rsub|\<gamma\>>|)>,
  </equation*>

  is the constant of proportionality.

  Bayes factors are the ratio of the marginal prior predictive distributions
  evaluated at <math|<n|y>>, that is, <math|B<rsub|\<gamma\>*0>=m<rsub|\<gamma\>><around|(|<n|y>|)>/m<rsub|0><around|(|<n|y>|)>>,
  where

  <\equation*>
    m<rsub|\<gamma\>><around|(|<n|y>|)>=<big|int>N<rsub|N>*<around|(|<n|y>\<mid\>\<alpha\><n|1>+<n|X><rsub|\<gamma\>><n|\<beta\>><rsub|\<gamma\>>,\<sigma\><rsup|2><n|I>|)>*<space|0.17em>\<pi\><rsub|\<gamma\>><around|(|\<alpha\>,<n|\<beta\>><rsub|\<gamma\>>,\<sigma\>|)>*<space|0.17em>d*\<alpha\>*<space|0.17em>d<n|\<beta\>><rsub|\<gamma\>><space|0.17em>d*\<sigma\>.
  </equation*>

  The function <math|\<pi\><rsub|\<gamma\>>> is the prior distribution for
  the parameters under model <math|M<rsub|\<gamma\>>>. It is well known that
  this prior can be neither improper nor <with|font-shape|italic|vague> (with
  a very large variance) since the resulting Bayes factors are essentially
  arbitrary (see <cite-raw|BerPer01>). Our default choice for this prior is
  the <math|g>-prior proposed by <cite-textual|Zellner86>:

  <\equation*>
    \<pi\><rsub|\<gamma\>><around|(|\<alpha\>,<n|\<beta\>><rsub|\<gamma\>>,\<sigma\>\<mid\>g|)>=\<sigma\><rsup|-1>*<space|0.17em>N<rsub|k<rsub|\<gamma\>>>*<around|(|<n|\<beta\>><rsub|\<gamma\>>\<mid\><n|0>,g*\<sigma\><rsup|2><space|0.17em><around|(|<n|X><rsub|\<gamma\>><rsup|t><around|(|<n|I>-N<rsup|-1><n|1><n|1><rsup|t>|)><n|X><rsub|\<gamma\>>|)><rsup|-1>|)>,
  </equation*>

  for <math|<n|\<gamma\>>\<ne\><n|0>> and
  <math|\<pi\><rsub|0><around|(|\<alpha\>,\<sigma\>|)>=\<sigma\><rsup|-1>>
  for <math|M<rsub|0>>.

  The <math|g>-priors seem to be greatly inspired by Jeffreys' ideas
  <cite-parenthesized|Jef61> and the corresponding extension to regression
  problems in <cite|ZellSiow80> and <cite|ZellSiow84>. The assignment of the
  constant <math|g> has been analyzed by several authors (see
  <cite-raw|liang08> and references therein). This parameter <math|g> must
  increase with <math|N> to avoid an asymptotically degenerate prior. The
  default assignment <math|g=N> gives rise to a `unit information" prior in
  the sense that the covariance matrix is corrected by the sample size (see
  <cite-raw|Raf98>).

  An alternative to the choice of the constant <math|g> is to assume,
  hierarchically, a proper prior on <math|g>, say
  <math|\<pi\><around|(|g\<mid\><n|\<gamma\>>|)>>. In general, the resulting
  prior for <math|<n|\<beta\>><rsub|\<gamma\>>> has heavy tails, this being
  an appealing characteristic of a model selection prior which is related to
  properties like information consistency (<cite-raw|BayGar08>). Examples of
  such priors are the multivariate Cauchy for
  <math|<n|\<beta\>><rsub|\<gamma\>>> proposed by Jeffreys-Zellner-Siow
  (<cite-raw|Jef61>, <cite-raw|ZellSiow80> and <cite-raw|ZellSiow84>) which
  corresponds to <math|\<pi\><around|(|g\<mid\><n|\<gamma\>>|)>=G*a*m*m*a<rsup|-1>*<around|(|1/2,N/2|)>>,
  the hyper-<math|g> priors of <cite|liang08>, the Conventional Robust prior
  in <cite|forte08> and the extension of the <math|g> prior in
  <cite|MarGeo10>.

  The <math|g>-prior provides closed-form expressions for the Bayes factors.
  In fact it can easily be shown that

  <\equation>
    <label|gBF>B<rsub|\<gamma\>*0><around|(|g|)>=<around*|(|1+g*<frac|S*S*E<rsub|\<gamma\>>|S*S*E<rsub|0>>|)><rsup|-<around|(|N-1|)>/2>*<space|0.17em><around|(|1+g|)><rsup|<around|(|N-k<rsub|\<gamma\>>-1|)>/2>,
  </equation>

  where <math|S*S*E<rsub|\<gamma\>>> is the sum of the squared errors for
  <math|M<rsub|\<gamma\>>>. Therefore, if all models <math|<n|\<gamma\>>> can
  be visited, their posterior probabilities can be calculated without great
  computational effort. Interestingly, the proposals in <cite|forte08> and
  <cite|MarGeo10> also lead to closed-form Bayes factors.

  Notice that independently of the approach adopted to construct the
  inferences (either <with|font-shape|italic|empirical> or renormalized) the
  above expression can easily be used to unequivocally identify the model
  that, within a given set of models, has the largest posterior probability
  (since of course it coincides with the model with the largest
  <math|B<rsub|\<gamma\>*0><around|(|g|)>>).

  <section|Search in large model spaces><label|sec.slm>

  With the distinction introduced in Section<nbsp><reference|sec.est>,
  <with|font-shape|italic|empirical> methods use the relative frequencies of
  models visited in a subset <math|\<cal-M\><rsup|\<ast\>>\<subset\>\<cal-M\>>
  as the basis for the estimation of <math|\<tau\>> (the quantity of
  interest). On the other hand, <with|font-shape|italic|re-normalized>
  methods base this estimation on the use of the renormalized expression of
  Bayes factors for models in <math|\<cal-M\><rsup|\<ast\>>>. Clearly, the
  way <math|\<cal-M\><rsup|\<ast\>>> is obtained can vary from one approach
  to the other. In this section we succinctly overview the methods used in
  the two approaches.

  <paragraph|<math|\<cal-M\><rsup|\<ast\>>> in
  <with|font-shape|italic|empirical> methods> Markov Chain Monte Carlo (MCMC)
  methods have provided decisive numerical support for the development of
  Bayesian methods over the last two decades. Bayesian model selection is not
  an exception and the literature devoted to MCMC strategies for solving the
  problem is extensive. When estimations are based on the frequency of
  visits, the models visited form approximately a sample from the posterior
  distribution. In this setting MCMC methods are an essential tool for
  generating the sample mentioned.

  A great majority of these proposals are to a certain extent based on the
  seminal work by <cite|GeoMc93>, greatly improved and extended in
  <cite|GeorgeMcCulloch97>. A number of interesting contributions on this
  area are <cite|KuoMal98>, <cite|Deletal00>, <cite|NottKohn05>, <cite|Nt02>,
  <cite|Nt09> and <cite|CasMor06>.

  In Appendix<nbsp><reference|ApGS> we describe the sampling strategy we
  propose for the model selection problem in (<reference|TheProblem>) with
  hierarchical <math|g> priors. This is a straightforward Gibbs sampling
  scheme that takes advantage of the integrated expression in
  (<reference|gBF>). The particular case for <math|g>-priors that we used in
  the examples had already been suggested by <cite|GeorgeMcCulloch97>. This
  sampling scheme can become extremely efficient in combination with updating
  identities for the SSE's (see <cite-raw|Gentle07> and references therein)
  since it is built upon steps in which a variable is either added or
  deleted.

  With an MCMC sampling, given an initial model
  <math|<n|\<gamma\>><rsup|<around|(|0|)>>>, we obtain a sample of models
  <math|\<cal-M\><rsup|\<ast\>>=<around|{|<n|\<gamma\>><rsup|<around|(|1|)>>,<n|\<gamma\>><rsup|<around|(|2|)>>,\<ldots\>,<n|\<gamma\>><rsup|<around|(|n|)>>|}>>
  having <math|P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>> as the
  stationary distribution. This is a key characteristic of
  <math|\<cal-M\><rsup|\<ast\>>> that provides the ensuing
  <with|font-shape|italic|empirical> estimators of <math|\<tau\>> with
  important characteristics described in Section<nbsp><reference|pss>.

  <paragraph|<math|\<cal-M\><rsup|\<ast\>>> in
  <with|font-shape|italic|re-normalized> methods> The origins of this
  approach date back at least to <cite|GeorgeMcCulloch97> who pointed out the
  possibility of using an MCMC sample <math|\<cal-M\><rsup|\<ast\>>> in
  combination with the normalized expression of Bayes factors as the basis
  for producing the required inferences. For those models in <math|\<cal-M\>>
  not sampled, the posterior probability is assumed to be zero and for the
  rest <math|<wide|P*r|^><around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>\<propto\>B<rsub|\<gamma\>*0>>,
  such that the sum over the models visited is one (i.e. probabilities are
  obtained by re-normalizing). Notice that this way, MCMC methods act as
  searching methods (on the grounds that good models should appear more
  frequently because they are more probable).

  When analyzing the method in the preceding paragraph two observations
  arise. First, since frequencies are not used, visiting a model more than
  once is a <with|font-shape|italic|waste> of time, suggesting that it would
  be preferable to sample without replacement. The second is that unvisited
  models in general have very low probability, so we should mainly focus on
  sampling good models (with high posterior probabilities). These two ideas
  have inspired the appearance of specific methods to search the model space
  for good models without repetition. Examples of such proposals are the
  Bayesian Adaptive Sampling method of <cite|Clyetal10>, the searching method
  of <cite|BerMol05> in which the Feature Inclusion Stochastic Search (FINCS)
  of <cite|ScCar09> and <cite|CarSc09> is based. A common recursive idea in
  these methods is that the exploration of <math|\<cal-M\>> is guided by
  estimates of inclusion probabilities of single covariates. This potentially
  leads to biased results because, it does not have to be the case that high
  inclusion probabilities point to the most probable models. We will see a
  demonstration of this effect in the examples in
  Section<nbsp><reference|sec.exact> and Section<nbsp><reference|sec.larger>.

  <section|Inferences in model selection problems><label|pss>

  In a model selection problem, one relevant question is which of the
  proposed models is the most probable in the light of the data (the highest
  posterior probability model, HPM). In this situation, the quantity of
  interest is

  <\equation*>
    \<tau\>=<text|HPM>=<argmax><rsub|\<gamma\>\<in\>M>B<rsub|\<gamma\>*0>*<space|0.17em>P*r<around|(|M<rsub|\<gamma\>>|)>.
  </equation*>

  Given a set of visited models <math|\<cal-M\><rsup|\<ast\>>>, the obvious
  and most precise way of estimating the HPM is common to
  <with|font-shape|italic|empirical> and <with|font-shape|italic|re-normalized>
  methods and is:

  <\equation*>
    <wide|\<tau\>|^>=<wide|<text|HPM>|^>=<argmax><rsub|\<gamma\>\<in\>M<rsup|\<ast\>>>B<rsub|\<gamma\>*0>*<space|0.17em>P*r<around|(|M<rsub|\<gamma\>>|)>.
  </equation*>

  Notice that the goodness of the estimation of the HPM only depends upon the
  ability of the methods to search for good models in very large model
  spaces.

  Nevertheless, very frequently, the quantity of interest <math|\<tau\>>
  which we want to infer is of a different nature. A crucial aspect is that
  on many occasions this quantity implicitly depends on the normalizing
  constant. These can be written in terms of the expectation

  <\equation>
    <label|total>\<tau\><around|(|a|)>=E<rsub|P*r<around|(|\<cdummy\>\<mid\>y|)>><around|(|a<around|(|M<rsub|\<gamma\>>|)>|)>=<big|sum><rsub|\<gamma\>><space|0.17em>a<around|(|M<rsub|\<gamma\>>|)>*<space|0.17em>P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>,
  </equation>

  where <math|a<around|(|M<rsub|\<gamma\>>|)>> is a known function of
  <math|M<rsub|\<gamma\>>>. Clearly, the posterior probability of a single
  model <math|M<rsub|\<gamma\>\<ast\>>> can be expressed as
  (<reference|total>) with <math|a<around|(|M<rsub|\<gamma\>>|)>=1> if
  <math|M<rsub|\<gamma\>>=M<rsub|\<gamma\>\<ast\>>>, and zero otherwise.
  There are many other examples of such representation of quantities of
  interest in the model selection problem.

  <with|font-size|0.84|<paragraph|Example 1: Inclusion probabilities and the
  median probability model> <with|font-shape|slanted|For a given explanatory
  variable <math|x<rsub|l>> the inclusion probability is defined as>>

  <\with|font-shape|slanted>
    <with|font-size|0.84|<\equation*>
      q<rsub|l>=<big|sum><rsub|\<gamma\>:<space|0.17em>\<gamma\><rsub|l>=1><space|0.17em>P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>.
    </equation*>>

    <with|font-size|0.84|These probabilities have interesting theoretical
    properties as shown in <cite|BarBer04> and are useful summaries of the
    posterior distribution. In particular, they can be helpful when the
    number of models is very large and the posterior probabilities of single
    models are so small that are very difficult to interpret. Apart from
    their intrinsic interest, inclusion probabilities are the basis of the
    median probability model in <cite|BarBer04>. This model, hereafter called
    MPM, is defined as the one with those variables with
    <math|q<rsub|l>\<gtr\>0.5> and the theory in <cite|BarBer04> suggests
    that the MPM model has optimal properties and is better for prediction
    purposes than the HPM (a surprising fact). The probability,
    <math|q<rsub|l>> can be expressed as (<reference|total>) with
    <math|a<around|(|M<rsub|\<gamma\>>|)>=1> if <math|\<gamma\><rsub|l>=1>,
    and 0 otherwise.>
  </with>

  <with|font-shape|slanted|font-size|0.84|Inclusion probabilities are the
  most popular element in a set of useful summaries for the variable
  selection problem. For instance, we can be interested in the joint
  posterior probability of both <math|x<rsub|l>> and
  <math|x<rsub|l<rprime|'>>> and this measure can also bewritten easily in
  the form of (<reference|total>). >

  <with|font-size|0.84|<paragraph|Example 2: Posterior probability of
  dimension of the `true' model> <with|font-shape|slanted|The probability
  that the `true' model has exactly <math|k<rsup|\<ast\>>> explanatory
  covariates is>>

  <with|font-shape|slanted|font-size|0.84|<\equation*>
    d<around|(|k<rsup|\<ast\>>|)>=<big|sum><rsub|\<gamma\>:<space|0.17em>k<rsub|\<gamma\>>=k<rsup|\<ast\>>><space|0.17em>P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>.
  </equation*>>

  <with|font-shape|slanted|font-size|0.84|This corresponds to the expression
  in (<reference|total>) with <math|a<around|(|M<rsub|\<gamma\>>|)>=1> if
  <math|k<rsub|\<gamma\>>=k<rsup|\<ast\>>>, and 0 otherwise.>

  <with|font-size|0.84|<paragraph|Example 3: Model averaging techniques>
  <with|font-shape|slanted|Suppose that <math|\<Delta\>> is a quantity of
  interest, then the posterior distribution
  <math|P*r<around|(|\<Delta\>\<mid\><n|y>|)>> is (<math|<reference|total>>)
  with <math|a<around|(|M<rsub|\<gamma\>>|)>=P*r<around|(|\<Delta\>\<mid\><n|y>,M<rsub|\<gamma\>>|)>>.>>

  <with|font-shape|slanted|font-size|0.84|What arises is, of course, the
  methodology called Model Averaging, which is just the Bayesian way of
  accounting for the uncertainty regarding which the true model is (see eg.
  <cite-raw|Hetal99>).>

  <with|font-shape|slanted|font-size|0.84|Special mention should be made of
  the case where <math|\<Delta\>> is a future observable
  <math|y<rsup|n*e*w>>, given certain values of the explanatory covariates
  <math|<n|x><rsup|n*e*w>>. In this case <math|P*r<around|(|y<rsup|n*e*w>\<mid\><n|y>|)>>
  is the posterior predictive distribution. Notice also that summaries of
  this distribution are special cases of (<math|<reference|total>>), like the
  posterior predictive expectation (with <math|a<around|(|M|)>=E<around|(|y<rsup|n*e*w>\<mid\>M<rsub|\<gamma\>>,<n|y>|)>>)
  or the posterior predictive variance.<next-line><vspace*|.5cm>>

  Now suppose <math|\<cal-M\><rsup|\<ast\>>=<around|{|<n|\<gamma\>><rsup|<around|(|1|)>>,<n|\<gamma\>><rsup|<around|(|2|)>>,\<ldots\>,<n|\<gamma\>><rsup|<around|(|n|)>>|}>>
  have been randomly simulated with replacement from <math|\<cal-M\>> such
  that on each draw, each model <math|M<rsub|\<gamma\>>> has a probability
  <math|P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>> of being selected (we
  think of <math|\<cal-M\><rsup|\<ast\>>> as approximately the sample of
  models produced with MCMC methods, Section<nbsp><reference|sec.slm>). What
  arises is a probability proportional to size sampling (see
  <cite-raw|Lohr99>) where the 'size' of each sampling unit (the models) is
  <math|P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>>. The usual estimator
  of <math|\<tau\><around|(|a|)>> in (<reference|total>) under this sampling
  scheme is

  <\equation*>
    <wide|\<tau\>|^><around|(|a|)>=<frac|1|n>*<space|0.17em><big|sum><rsub|j=1><rsup|n><space|0.17em><frac|a<around|(|M<rsub|\<gamma\><rsup|<around|(|j|)>>>|)>*P*r<around|(|M<rsub|\<gamma\><rsup|<around|(|j|)>>>\<mid\><n|y>|)>|P*r<around|(|M<rsub|\<gamma\><rsup|<around|(|j|)>>>\<mid\><n|y>|)>>=<frac|1|n>*<space|0.17em><big|sum><rsub|j=1><rsup|n><space|0.17em>a<around|(|M<rsub|\<gamma\><rsup|<around|(|j|)>>>|)>,
  </equation*>

  usually known in the literature as the Hansen-Hurwitz for random sampling
  with replacement estimator (<cite-raw|HanHur43>). It can be easily shown
  that <math|<wide|\<tau\>|^><around|(|a|)>> is an unbiased estimator of
  <math|\<tau\><around|(|a|)>> (<cite-raw|Lohr99>).

  As a consequence, the Hansen-Hurwitz estimator of the posterior probability
  of a single model <math|M<rsup|\<ast\>>> is the frequency of
  <math|M<rsup|\<ast\>>> in <math|\<cal-M\><rsup|\<ast\>>>. Likewise,
  Hansen-Hurwitz estimators of the quantities in the previous examples are

  <\equation*>
    <wide|q|^><rsub|l>=<frac|1|n>*<big|sum><rsub|j:<space|0.17em>\<gamma\><rsup|<around|(|j|)>><rsub|l>=1><space|0.17em>1,<space|1cm><wide|d|^><around|(|k<rsup|\<ast\>>|)>=<frac|1|n>*<big|sum><rsub|j:<space|0.17em>k<rsub|\<gamma\><rsup|<around|(|j|)>>>=k<rsup|\<ast\>>><space|0.17em>1,
  </equation*>

  and

  <\equation*>
    <wide|P*r|^><around|(|\<Delta\>\<mid\><n|y>|)>=<frac|1|n>*<big|sum><rsub|j=1><rsup|n>P*r<around|(|\<Delta\>\<mid\><n|y>,M<rsub|\<gamma\><rsup|<around|(|j|)>>>|)>.
  </equation*>

  Of course, these are just the <with|font-shape|italic|empirical> estimators
  (as labeled in Section<nbsp>1) based on the frequency of visits. This
  correspondence is a key point which provides theoretical support to the
  arguments introduced in Section<nbsp>1 and our experience (partially
  presented in the following sections) regarding the extremely good results
  of <with|font-shape|italic|empirical> methods. It now becomes obvious that
  these estimators are implicitly based on the analytical expression of the
  Bayes factor through the sampling mechanism used. Moreover, they enjoy the
  desirable properties of Hansen-Hurwitz estimators, as for example
  unbiasedness.

  Furthermore, it is quite interesting that these estimators come with a
  measure of precision, a characteristic that has remained unnoticed in this
  context until now. This may have interesting applications and important
  consequences as, for instance, knowing when <math|n> gives enough precision
  in the estimation of the quantity of interest. If the draws on
  <math|\<cal-M\><rsup|\<ast\>>> are independently obtained, the variance of
  <math|<wide|\<tau\>|^><around|(|a|)>> is (see eg. <cite-raw|Lohr99>)

  <\equation*>
    V<around|(|<wide|\<tau\>|^><around|(|a|)>|)>=<frac|1|n>*<big|sum><rsub|\<gamma\>\<in\>\<cal-M\>><space|0.17em>P*r<around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>*<around*|(|a<around|(|M<rsub|\<gamma\>>|)>-\<tau\><around|(|a|)>|)><rsup|2>.
  </equation*>

  In the case that the quantity of interest is a probability <math|p> (e.g.
  probability of a single model or an inclusion probability),
  <math|V<around|(|<wide|\<tau\>|^><around|(|a|)>|)>=n<rsup|-1>*p*<around|(|1-p|)>>
  which is, of course, bounded above by <math|1/<around|(|4*n|)>> (this bound
  being a reasonable measure of the variability for probabilities not very
  close to zero). This provides an accurate idea of the precision achieved
  with the procedure and can be used (depending on the magnitude of the
  probability being estimated), for example, to decide the number of draws
  needed.

  Also useful is that an unbiased estimator of the variance of
  <math|<wide|\<tau\>|^><around|(|a|)>> is

  <\equation>
    <label|var.tau><wide|V|^><around|(|<wide|\<tau\>|^><around|(|a|)>|)>=<frac|1|n*<around|(|n-1|)>>*<space|0.17em><big|sum><rsub|j=1><rsup|n><space|0.17em><around*|(|a<around|(|M<rsub|\<gamma\><rsup|<around|(|j|)>>>|)>-<wide|\<tau\>|^><around|(|a|)>|)><rsup|2>.
  </equation>

  At this point it could argued that these results are of limited importance
  in practice since in MCMC sampling schemes we are not exactly sampling from
  the posterior and draws are dependent. Strictly speaking this is true,
  although our experience (partially shown in the following sections) is that
  these properties (the unbiasedness of <math|<wide|\<tau\>|^><around|(|a|)>>
  and the expression for <math|<wide|V|^><around|(|<wide|\<tau\>|^><around|(|a|)>|)>>
  hold quite accurately in practice. On the other hand, notably, these are
  basic assumptions that underlie <with|font-shape|italic|any> analysis
  solved with MCMC methods, the literature containing plenty of techniques
  for improving the results of MCMC methods in this sense. Among them,
  probably the most popular and simple to implement yet very effective are
  <with|font-shape|italic|thinning> (to systematically keep one simulation
  out of several) and <with|font-shape|italic|burning> (to reject some of the
  first simulations).

  The estimator of <math|\<tau\><around|(|a|)>> within the
  <with|font-shape|italic|re-normalized> approach is

  <\equation*>
    <big|sum><rsub|\<gamma\>\<in\>M<rsup|\<ast\>>><space|0.17em>a<around|(|M<rsub|\<gamma\>>|)><space|0.17em><wide|P*r|^><around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>
  </equation*>

  where the posterior probabilities of single models are obtained by
  re-normalizing the Bayes factors, that is

  <\equation*>
    <wide|P*r|^><around|(|M<rsub|\<gamma\>>\<mid\><n|y>|)>=B<rsub|\<gamma\>*0>/<big|sum><rsub|\<gamma\>\<in\>M<rsup|\<ast\>>><space|0.17em>B<rsub|\<gamma\>*0>.
  </equation*>

  The properties of these estimators are in general difficult to derive. The
  bias of such estimators for the posterior probability for single models has
  been the subject of a recent study in <cite|ClyGho10>.

  <section|Example I: a large problem with an exact
  solution><label|sec.exact>

  Mainly for comparative purposes but also to report the exact results on a
  moderately large problem (something that has not been done before to the
  best of our knowledge), here we present the exact solution for a problem
  with <math|p=35> covariates, and hence with
  <math|34,359,738,368\<approx\>3\<cdot\>10<rsup|10>> different models.
  Having the exact results of a large problem seems to us the most
  informative and clarifying way of comparing the performance of searching
  methods.

  <\big-table>
    <with|font-size|0.84|<scalebox|0.75|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|12|12|1|-1|cell-bborder|1ln>|<table|<row|<cell|Variable>|<cell|Description>>|<row|<cell|<math|y>>|<cell|Response
    = Daily maximum 1-hour-average ozone reading (ppm) at Upland,
    CA>>|<row|<cell|<math|x<rsub|1>>>|<cell|Month: 1 = January, . . . , 12 =
    December>>|<row|<cell|<math|x<rsub|2>>>|<cell|Day of
    month>>|<row|<cell|<math|x<rsub|3>>>|<cell|Day of week: 1 = Monday, . . .
    , 7 = Sunday>>|<row|<cell|<math|x<rsub|4>>>|<cell|500-millibar pressure
    height (m) measured at Vandenberg AFB>>|<row|<cell|<math|x<rsub|5>>>|<cell|Wind
    speed (mph) at Los Angeles International Airport
    (LAX)>>|<row|<cell|<math|x<rsub|6>>>|<cell|Humidity (%) at
    LAX>>|<row|<cell|<math|x<rsub|7>>>|<cell|Temperature (Fahrenheit degrees)
    measured at Sandburg, CA>>|<row|<cell|<math|x<rsub|8>>>|<cell|Inversion
    base height (feet) at LAX>>|<row|<cell|<math|x<rsub|9>>>|<cell|Pressure
    gradient (mm Hg) from LAX to Daggett,
    CA>>|<row|<cell|<math|x<rsub|10>>>|<cell|Visibility (miles) measured at
    LAX>>>>> ><label|descrip>>
  </big-table|<with|font-size|0.84|Description of variables used in Example I
  and Example II>>

  It is generally understood that problems with <math|p> larger than 25-30
  are intractable, <cite-parenthesized|Clyetal10>, and we are considering a
  step beyond this limiting size. The results we obtained were derived using
  a cloud with 150 processors and took approximately 20 hours to run. The
  code was written in C, with the gsl library <cite-parenthesized|Galetal09>.
  The source code is available upon request.

  The data we analyzed were previously used by <cite|CasMor06> and
  <cite|BerMol05> and concern <math|N=178> measures of ozone concentration in
  the atmosphere. Details on the data can be found in <cite|CasMor06>. Of the
  10 main effects originally considered, we only make use of those with an
  atmospheric meaning, as was done by <cite|liang08>. Then we have 7 main
  effects which, jointly with the quadratic terms and second order
  interactions, produce the above mentioned <math|p=35> possible regressors.
  For comparative purposes, we keep the original notation of the variables
  defined in Table<nbsp><reference|descrip>. We call this dataset Ozone35,
  for which we now present the exact results. We use the <math|g>-prior with
  <math|g=N> and a constant prior for the prior probabilities of models.

  The sum of all Bayes factors (the proportionality constant) is

  <\equation*>
    <big|sum><rsub|\<gamma\>><space|0.17em>B<rsub|\<gamma\>*0><around|(|g|)>=1.13*<space|0.17em>10<rsup|50>.
  </equation*>

  The highest probability model, HPM, has covariates
  <math|<around|{|1,x<rsub|10>,x<rsub|4>*x<rsub|6>,x<rsub|6>*x<rsub|8>,x<rsub|7><rsup|2>,x<rsub|7>*x<rsub|10>|}>>
  and has a posterior probability of <math|0.0009>, with a Bayes factor (in
  its favor and against <math|M<rsub|0>>) of
  <math|1.02*<space|0.17em>10<rsup|47>>. The first 1000 most probable models
  accumulated a total probability of <math|0.07> and a sum of Bayes factors
  (expressed in decimal logarithm) of 48.92 (this value is used later).
  Inclusion probabilities of each variable are in
  Table<nbsp><reference|incprob>. Hence, the median inclusion probability
  model, MPM, is <math|<around|{|1,x<rsub|6><rsup|2>,x<rsub|6>*x<rsub|7>,x<rsub|6>*x<rsub|8>,x<rsub|7>*x<rsub|10>|}>>
  which has a posterior probability which is twenty three times lower than
  the probability of the highest posterior probability. Moreover, there are
  851 models which are more probable than the median inclusion probability
  model.

  We then run the following methods ten times, each with <math|n=10000>
  iterations.

  <\itemize>
    <item*|<with|font-family|tt|Freq>>Gibbs sampling with algorithm in
    Appendix<nbsp><reference|ApGS> the with initial model
    <math|M<rsup|<around|(|0|)>>=M<rsub|0>> (we did not observe differences
    starting with the full model or with a randomly chosen model). For a fair
    comparison among the methods compared we did not exclude any model
    sampled and did not use any burning period.

    <item*|<with|font-family|tt|BAS>>Bayesian adaptive sampling of
    <cite|Clyetal10> through the corresponding R-package
    <with|font-family|tt|BAS>. As recommended (personal communication), we
    used <with|font-family|tt|method="MCMC+BAS">, which uses an MCMC method
    to initialize the search (this is a clear improvement over other options
    like <with|font-family|tt|eplogp>, which uses a rough approximation of
    inclusion probabilities with p-values to initialize the search). We tuned
    the parameter <with|font-family|tt|update=500> so that sampling
    probabilities were updated every 500 iterations.

    <item*|<with|font-family|tt|SSBM>>The Stochastic Search in
    <cite|BerMol05>. This method was originally proposed for a particular
    prior but the searching algorithm can be easily adapted to accommodate
    the <math|g>-prior.
  </itemize>

  The estimates computed in <with|font-family|tt|Freq> are proportional to
  size (see Section<nbsp><reference|pss>) and hence based on frequency of
  visits, and in <with|font-family|tt|BAS> and <with|font-family|tt|SSBM>
  estimators are based on the renormalization of the Bayes factors. Hence,
  <with|font-family|tt|Freq> is a particular method within the
  <with|font-shape|italic|empirical> approach, while
  <with|font-family|tt|BAS> and <with|font-family|tt|SSBM> are methods of the
  <with|font-shape|italic|re-normalized> approach (using the labels
  introduced in Section<nbsp><reference|sec.est>). The results are summarized
  in Table<nbsp><reference|incprob>.

  For the first run of each method, in Table<nbsp><reference|incprob> we
  present estimates of the inclusion probabilities, the MPM and the HPM. With
  this same run we estimated the standard deviation of the estimators of the
  inclusion probabilities with <with|font-family|tt|Freq> using
  (<reference|var.tau>). In addition, with the ten runs we computed the
  observed standard deviation as this provides a measure of variability in
  <with|font-family|tt|BAS> and <with|font-family|tt|SSBM> (for which an
  expression like the one in <reference|var.tau> does not exist).

  The main conclusions that we have extracted from the former simulations can
  be summarized as follows:

  <paragraph|Regarding the MPM and inclusion probabilities> Of the ten
  experiments conducted, <with|font-family|tt|Freq> correctly identified the
  MPM ten times while with <with|font-family|tt|BAS> and
  <with|font-family|tt|SSBM> the estimated MPM and the real MPM did not
  coincide in any of the ten runs. Also, <with|font-family|tt|Freq> provides
  very accurate estimations of the (exact) inclusion probabilities with a
  small variability. This confirms the high efficiency of such estimators.
  The observed variability with <with|font-family|tt|BAS> and
  <with|font-family|tt|SSBM> is large, so in general we expect large
  differences in repetitions of these methods in a similar manner to that
  observed in this experiment.

  One great advantage of <with|font-shape|italic|empirical> methods over
  <with|font-shape|italic|re-normalized> is that the first come with a
  measure of precision in the estimates (<reference|var.tau>). This measure
  can be legitimately criticized since it is just an approximation due to the
  dependency between the simulations. Nevertheless, in this experiment these
  estimates and the observed standard deviation are quite close to each
  other, suggesting that (<reference|var.tau>) is quite a reasonable
  estimator.

  The most worrisome aspect observed of <with|font-family|tt|BAS> and
  <with|font-family|tt|SSBM> is that they are clearly biased: for certain
  covariates we have to move from the point estimation more than 10 times the
  standard deviation to cover the exact value of the inclusion probabilities
  (see eg. <math|x<rsub|6>*x<rsub|8>> in <with|font-family|tt|BAS> and
  <math|x<rsub|5>*x<rsub|10>> in <with|font-family|tt|SSBM>). The nature and
  origin of this bias has an easy interpretation after a careful reading of
  the table. In <with|font-family|tt|BAS>, six inclusion probabilities are
  overestimated, of which five are in the estimated HPM; the rest are
  underestimated. A similar pattern is found in <with|font-family|tt|SSBM>.
  This means that inclusion probabilities within these methods are very
  influenced by the estimated HPM, leading to a bias in the direction of the
  HPM model. This effect is, in our opinion, the manifestation of a search in
  the model space for good models guided by the inclusion probabilities.

  <paragraph|Regarding the HPM and probability mass discovered> One
  interesting question is which method is visiting better models.
  <with|font-family|tt|BAS> and <with|font-family|tt|SSBM> are, in some
  sense, specifically designed with this aim while this characteristic is
  presumed in MCMC methods (since more probable models should be visited just
  because they are more probable). In our experiment,
  <with|font-family|tt|BAS> correctly identified the HPM nine times while
  <with|font-family|tt|SSBM> did it five times. The exact HPM was among the
  visited models in <with|font-family|tt|Freq> in the ten runs, showing that
  <with|font-family|tt|Freq> is visiting good models.

  Finally, we calculated the mean and standard deviation (over the ten runs)
  of the sum of the Bayes factors of the 1000 (in decimal logarithm) most
  probable different models explored (to be compared with the exact value
  given above of 48.92). The results were 48.77(0.01), 48.64(0.05) and
  48.50(0.20), for <with|font-family|tt|Freq>, <with|font-family|tt|BAS> and
  <with|font-family|tt|SSBM> respectively. In this respect, the three methods
  analyzed behave similarly, although <with|font-family|tt|Freq> gives more
  stable answers.

  <\big-table>
    <with|font-size|0.84|<scalebox|0.75|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|7|7|cell-halign|c>|<cwith|1|-1|8|8|cell-halign|c>|<cwith|1|-1|9|9|cell-halign|c>|<cwith|1|-1|10|10|cell-halign|c>|<cwith|1|-1|11|11|cell-halign|c>|<cwith|1|-1|12|12|cell-halign|c>|<cwith|1|-1|12|12|cell-rborder|0ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|9|9|1|-1|cell-valign|top>|<cwith|9|9|1|-1|cell-vmode|exact>|<cwith|9|9|1|-1|cell-height|<plus|1fn|.1cm>>|<cwith|10|10|1|-1|cell-bborder|1ln>|<cwith|18|18|1|-1|cell-valign|top>|<cwith|18|18|1|-1|cell-vmode|exact>|<cwith|18|18|1|-1|cell-height|<plus|1fn|.1cm>>|<cwith|19|19|1|-1|cell-bborder|1ln>|<cwith|27|27|1|-1|cell-valign|top>|<cwith|27|27|1|-1|cell-vmode|exact>|<cwith|27|27|1|-1|cell-height|<plus|1fn|.1cm>>|<cwith|28|28|1|-1|cell-bborder|1ln>|<cwith|36|36|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|Method>|<cell|<math|1>>|<cell|<math|x<rsub|4>>>|<cell|<math|x<rsub|5>>>|<cell|<math|x<rsub|6>>>|<cell|<math|x<rsub|7>>>|<cell|<math|x<rsub|8>>>|<cell|<math|x<rsub|9>>>|<cell|<math|x<rsub|10>>>|<cell|<math|x<rsub|4><rsup|2>>>>|<row|<cell|<math|q<rsub|l>>>|<cell|exact>|<cell|1<math|\<ast\>\<dag\>>>|<cell|0.164>|<cell|0.096>|<cell|0.297>|<cell|0.195>|<cell|0.200>|<cell|0.291>|<cell|0.368*>|<cell|0.164>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|Freq>>|<cell|1<math|\<ast\>\<dag\>>>|<cell|0.157>|<cell|0.099>|<cell|0.300>|<cell|0.191>|<cell|0.200>|<cell|0.292>|<cell|0.368*>|<cell|0.162>>|<row|<cell|<math|<around|[|<wide|V|^><around|(|<wide|q|^><rsub|l>|)>|]><rsup|1/2>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0)>|<cell|(0.004)>|<cell|(0.003)>|<cell|(0.005)>|<cell|(0.004)>|<cell|(0.004)>|<cell|(0.005)>|<cell|(0.005)>|<cell|(0.004)>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0)>|<cell|(0.005)>|<cell|(0.002)>|<cell|(0.007)>|<cell|(0.008)>|<cell|(0.007)>|<cell|(0.004)>|<cell|(0.005)>|<cell|(0.004)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|BAS>>|<cell|1<math|\<ast\>\<dag\>>>|<cell|0.022>|<cell|0.01>|<cell|0.231>|<cell|0.032>|<cell|0.025>|<cell|0.092>|<cell|0.508<math|\<ast\>\<dag\>>>|<cell|0.023>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|BAS>>|<cell|(0)>|<cell|(0.007)>|<cell|(0.003)>|<cell|(0.046)>|<cell|(0.017)>|<cell|(0.018)>|<cell|(0.027)>|<cell|(0.078)>|<cell|(0.006)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|SSBM>>|<cell|1<math|\<ast\>\<dag\>>>|<cell|0.105>|<cell|0.03>|<cell|0.04>|<cell|0.053>|<cell|0.073>|<cell|0.297>|<cell|0.131>|<cell|0.125>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|SSBM>>|<cell|(0)>|<cell|(0.034)>|<cell|(0.006)>|<cell|(0.154)>|<cell|(0.021)>|<cell|(0.046)>|<cell|(0.086)>|<cell|(0.277)>|<cell|(0.037)>>|<row|<cell|>|<cell|>|<cell|<math|x<rsub|4>*x<rsub|5>>>|<cell|<math|x<rsub|4>*x<rsub|6>>>|<cell|<math|x<rsub|4>*x<rsub|7>>>|<cell|<math|x<rsub|4>*x<rsub|8>>>|<cell|<math|x<rsub|4>*x<rsub|9>>>|<cell|<math|x<rsub|4>*x<rsub|10>>>|<cell|<math|x<rsub|5><rsup|2>>>|<cell|<math|x<rsub|5>*x<rsub|6>>>|<cell|<math|x<rsub|5>*x<rsub|7>>>>|<row|<cell|<math|q<rsub|l>>>|<cell|exact>|<cell|0.095>|<cell|0.325*>|<cell|0.252>|<cell|0.208>|<cell|0.301>|<cell|0.361>|<cell|0.124>|<cell|0.107>|<cell|0.094>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|Freq>>|<cell|0.094>|<cell|0.320*>|<cell|0.244>|<cell|0.210>|<cell|0.303>|<cell|0.360>|<cell|0.127>|<cell|0.104>|<cell|0.095>>|<row|<cell|<math|<around|[|<wide|V|^><around|(|<wide|q|^><rsub|l>|)>|]><rsup|1/2>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0.003)>|<cell|(0.005)>|<cell|(0.004)>|<cell|(0.004)>|<cell|(0.005)>|<cell|(0.005)>|<cell|(0.003)>|<cell|(0.003)>|<cell|(0.003)>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0.002)>|<cell|(0.01)>|<cell|(0.006)>|<cell|(0.005)>|<cell|(0.008)>|<cell|(0.006)>|<cell|(0.002)>|<cell|(0.003)>|<cell|(0.003)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|BAS>>|<cell|0.019>|<cell|0.373*>|<cell|0.164>|<cell|0.061>|<cell|0.078>|<cell|0.416>|<cell|0.019>|<cell|0.013>|<cell|0.012>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|BAS>>|<cell|(0.003)>|<cell|(0.09)>|<cell|(0.043)>|<cell|(0.024)>|<cell|(0.049)>|<cell|(0.061)>|<cell|(0.005)>|<cell|(0.004)>|<cell|(0.003)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|SSBM>>|<cell|0.037>|<cell|0.03>|<cell|0.082>|<cell|0.092>|<cell|0.348*>|<cell|0.132>|<cell|0.047>|<cell|0.049>|<cell|0.035>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|SSBM>>|<cell|(0.007)>|<cell|(0.295)>|<cell|(0.285)>|<cell|(0.16)>|<cell|(0.098)>|<cell|(0.33)>|<cell|(0.008)>|<cell|(0.011)>|<cell|(0.008)>>|<row|<cell|>|<cell|>|<cell|<math|x<rsub|5>*x<rsub|8>>>|<cell|<math|x<rsub|5>*x<rsub|9>>>|<cell|<math|x<rsub|5>*x<rsub|10>>>|<cell|<math|x<rsub|6><rsup|2>>>|<cell|<math|x<rsub|6>*x<rsub|7>>>|<cell|<math|x<rsub|6>*x<rsub|8>>>|<cell|<math|x<rsub|6>*x<rsub|9>>>|<cell|<math|x<rsub|6>*x<rsub|10>>>|<cell|<math|x<rsub|7><rsup|2>>>>|<row|<cell|<math|q<rsub|l>>>|<cell|exact>|<cell|0.098>|<cell|0.088>|<cell|0.124>|<cell|0.532<math|\<dag\>>>|<cell|0.636<math|\<dag\>>>|<cell|0.560<math|\<ast\>\<dag\>>>|<cell|0.126>|<cell|0.115>|<cell|0.450*>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|Freq>>|<cell|0.098>|<cell|0.087>|<cell|0.124>|<cell|0.524<math|\<dag\>>>|<cell|0.634<math|\<dag\>>>|<cell|0.564<math|\<ast\>\<dag\>>>|<cell|0.127>|<cell|0.113>|<cell|0.465*>>|<row|<cell|<math|<around|[|<wide|V|^><around|(|<wide|q|^><rsub|l>|)>|]><rsup|1/2>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0.003)>|<cell|(0.003)>|<cell|(0.003)>|<cell|(0.005)>|<cell|(0.005)>|<cell|(0.005)>|<cell|(0.003)>|<cell|(0.003)>|<cell|(0.005)>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0.002)>|<cell|(0.003)>|<cell|(0.004)>|<cell|(0.008)>|<cell|(0.012)>|<cell|(0.007)>|<cell|(0.003)>|<cell|(0.001)>|<cell|(0.009)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|BAS>>|<cell|0.009>|<cell|0.014>|<cell|0.014>|<cell|0.282>|<cell|0.493>|<cell|0.929<math|\<ast\>\<dag\>>>|<cell|0.025>|<cell|0.019>|<cell|0.793<math|\<ast\>\<dag\>>>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|BAS>>|<cell|(0.003)>|<cell|(0.003)>|<cell|(0.004)>|<cell|(0.078)>|<cell|(0.117)>|<cell|(0.034)>|<cell|(0.004)>|<cell|(0.007)>|<cell|(0.066)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|SSBM>>|<cell|0.027>|<cell|0.017>|<cell|0.023>|<cell|0.98<math|\<ast\>\<dag\>>>|<cell|1<math|\<ast\>\<dag\>>>|<cell|0.077>|<cell|0.078>|<cell|0.031>|<cell|0.112>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|SSBM>>|<cell|(0.005)>|<cell|(0.01)>|<cell|(0.009)>|<cell|(0.301)>|<cell|(0.37)>|<cell|(0.339)>|<cell|(0.017)>|<cell|(0.043)>|<cell|(0.342)>>|<row|<cell|>|<cell|>|<cell|<math|x<rsub|7>*x<rsub|8>>>|<cell|<math|x<rsub|7>*x<rsub|9>>>|<cell|<math|x<rsub|7>*x<rsub|10>>>|<cell|<math|x<rsub|8><rsup|2>>>|<cell|<math|x<rsub|8>*x<rsub|9>>>|<cell|<math|x<rsub|8>*x<rsub|10>>>|<cell|<math|x<rsub|9><rsup|2>>>|<cell|<math|x<rsub|9>*x<rsub|10>>>|<cell|<math|x<rsub|10><rsup|2>>>>|<row|<cell|<math|q<rsub|l>>>|<cell|exact>|<cell|0.349>|<cell|0.431>|<cell|0.743<math|\<ast\>\<dag\>>>|<cell|0.142>|<cell|0.263>|<cell|0.236>|<cell|0.434>|<cell|0.103>|<cell|0.117>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|Freq>>|<cell|0.346>|<cell|0.430>|<cell|0.756<math|\<ast\>\<dag\>>>|<cell|0.140>|<cell|0.264>|<cell|0.231>|<cell|0.440>|<cell|0.103>|<cell|0.116>>|<row|<cell|<math|<around|[|<wide|V|^><around|(|<wide|q|^><rsub|l>|)>|]><rsup|1/2>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0.005)>|<cell|(0.005)>|<cell|(0.004)>|<cell|(0.003)>|<cell|(0.004)>|<cell|(0.004)>|<cell|(0.005)>|<cell|(0.003)>|<cell|(0.003)>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|Freq>>|<cell|(0.008)>|<cell|(0.01)>|<cell|(0.006)>|<cell|(0.004)>|<cell|(0.008)>|<cell|(0.004)>|<cell|(0.005)>|<cell|(0.003)>|<cell|(0.002)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|BAS>>|<cell|0.091>|<cell|0.124>|<cell|0.965<math|\<ast\>\<dag\>>>|<cell|0.017>|<cell|0.045>|<cell|0.127>|<cell|0.393>|<cell|0.022>|<cell|0.018>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|BAS>>|<cell|(0.047)>|<cell|(0.073)>|<cell|(0.026)>|<cell|(0.013)>|<cell|(0.032)>|<cell|(0.028)>|<cell|(0.032)>|<cell|(0.005)>|<cell|(0.007)>>|<row|<cell|<math|<wide|q|^><rsub|l>>>|<cell|<with|font-family|tt|SSBM>>|<cell|0.975<math|\<ast\>\<dag\>>>|<cell|0.663<math|\<ast\>\<dag\>>>|<cell|0.879<math|\<ast\>\<dag\>>>|<cell|0.026>|<cell|0.57<math|\<ast\>\<dag\>>>|<cell|0.597<math|\<ast\>\<dag\>>>|<cell|0.244>|<cell|0.059>|<cell|0.064>>|<row|<cell|<math|S<around|(|<wide|q|^><rsub|l>|)>>>|<cell|<with|font-family|tt|SSBM>>|<cell|(0.425)>|<cell|(0.193)>|<cell|(0.209)>|<cell|(0.01)>|<cell|(0.164)>|<cell|(0.178)>|<cell|(0.084)>|<cell|(0.015)>|<cell|(0.015)>>>>>
    ><label|incprob>>
  </big-table|<with|font-size|0.84|Inclusion probabilities (exact
  <math|q<rsub|l>> and estimates <math|<wide|q|^><rsub|l>> in one run of
  10000 iterations) for the Ozone35 data set. Also,
  <math|<wide|V|^><around|(|<wide|q|^><rsub|l>|)>> is the estimated variance
  (<reference|var.tau>) using this same run and
  <math|S<around|(|<wide|q|^><rsub|l>|)>> is the deviation of the estimators
  observed in ten independent identical runs. Symbols (<math|\<dag\>>) for
  those variables in the estimated MPM and asterisks (<math|\<ast\>>) for
  those variables in the estimated HPM.>>

  <section|Example II: A much larger problem><label|sec.larger>

  We now consider the full Ozone dataset with the 10 main effects, the
  quadratic terms and the second order interactions. The same problem has
  been considered by <cite|BerMol05> and, as before, we keep the same
  notation for the covariates as there. This problem has <math|p=65> and
  hence <math|2<rsup|65>\<approx\>3.7*<space|0.17em>10<rsup|19>> models in
  <math|\<cal-M\>>. In what follows we call this dataset Ozone65. The size of
  <math|\<cal-M\>> precludes having the exact answer to the problem. To have
  an approximate idea of the unfeasibility, notice that with the C code that
  we used for the Ozone35 it would take more than 350 years to compute the
  answer using a cloud with <math|10<rsup|6>> processors. We cannot wait that
  long.

  For this dataset we repeated the comparison in
  Section<nbsp><reference|sec.exact> and performed 10 different runs, now
  each with <math|n=100,000> iterations, of <with|font-family|tt|Freq>,
  <with|font-family|tt|BAS> and <with|font-family|tt|SSBM>. In
  Table<nbsp><reference|oz65incprob> we present the statistics of all the
  variables included in the estimated HPM and MPM in any of the runs of the
  methods being compared. In essence, these results are in clear agreement
  with our findings in Ozone35, and confirm the conclusions drawn there.

  <paragraph|Regarding the MPM and inclusion probabilities> It is unknown
  which model is the MPM (and it will probably never be known) but results
  with <with|font-family|tt|Freq> provide a very reasonable and consistent
  picture of the solution. In <with|font-family|tt|Freq>, except for one run,
  there is unanimity in the estimation of the MPM. Furthermore, and quite
  appealing is that we can give an explanation of the disagreement in terms
  of errors in the estimation. The discordant run differs from all the others
  in that it includes <math|x<rsub|1>*x<rsub|4>>. This variable has an
  estimated inclusion probability of <math|0.497> with an estimated error of
  <math|0.002>.

  In <with|font-family|tt|BAS> and <with|font-family|tt|SSBM> results vary
  considerably over the different runs. In <with|font-family|tt|BAS>
  (<with|font-family|tt|SSBM>) 8(10) different models were estimated as the
  MPM and hence, at least 8(9) times this method has incorrectly identified
  the true MPM. More worrisome is that these bad results do not seem to be
  due to variability. We can find a more likely explanation in
  Table<nbsp><reference|oz65incprob> where we can clearly see that, in many
  occasions, the estimated MPM mimics the estimated HPM. For instance,
  <math|x<rsub|6>> and <math|x<rsub|4>*x<rsub|6>> (which are in the estimated
  HPM) are always in the MPM estimated by <with|font-family|tt|BAS>. This
  also seems to be the case for <math|x<rsub|4>*x<rsub|7>> and
  <math|x<rsub|6>*x<rsub|7>> in <with|font-family|tt|SSBM>. We interpret
  these results as a manifestation of the bias produced with methods
  conceived to look for good models.

  <paragraph|Regarding the HPM and probability mass discovered> In this
  aspect, the three methods behave quite similarly, perhaps
  <with|font-family|tt|BAS> and <with|font-family|tt|Freq> perhaps performing
  slightly better than <with|font-family|tt|SSBM>. The best model found in
  the whole experiment, the ten runs of the three different methods, had a
  Bayes factor (in its favor and against the null) of 50.87 (expressed in
  decimal logarithm). This model, identified in
  Table<nbsp><reference|oz65incprob> with asterisks, was visited in four of
  the ten runs by <with|font-family|tt|BAS>, in three runs by
  <with|font-family|tt|Freq> and in one run by <with|font-family|tt|SSBM>.
  The means (standard deviations) over the ten runs of the sum of the Bayes
  factors of the 1000 (in decimal logarithm) most probable different models
  explored were 52.77(0.02) in <with|font-family|tt|Freq>, 52.78(0.15) in
  <with|font-family|tt|BAS> and 52.78(0.16) in <with|font-family|tt|SSBM>.
  These results confirm the popular hypothesis that good models also show up
  when sampling from the posterior distribution.

  <\big-table>
    <with|font-size|0.84|<scalebox|0.75|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|7|7|cell-halign|c>|<cwith|1|-1|7|7|cell-rborder|1ln>|<cwith|1|-1|8|8|cell-halign|c>|<cwith|1|-1|8|8|cell-rborder|0ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|2|2|cell-col-span|3>|<cwith|1|1|2|2|cell-halign|c>|<cwith|1|1|2|2|cell-rborder|0ln>|<cwith|1|1|5|5|cell-col-span|3>|<cwith|1|1|5|5|cell-halign|c>|<cwith|1|1|5|5|cell-rborder|0ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|25|25|1|-1|cell-bborder|1ln>|<cwith|26|26|2|2|cell-col-span|6>|<cwith|26|26|2|2|cell-halign|c>|<cwith|26|26|2|2|cell-rborder|0ln>|<table|<row|<cell|>|<cell|HPD>|<cell|>|<cell|>|<cell|MPM>|<cell|>|<cell|>|<cell|<math|<wide|q|^><rsub|l><space|0.17em><around|(|<wide|V|^><around|(|<wide|q|^><rsub|l>|)><rsup|1/2>|)>>>>|<row|<cell|Method>|<cell|Freq>|<cell|BAS>|<cell|SSBM>|<cell|Freq>|<cell|BAS>|<cell|SSBM>|<cell|>>|<row|<cell|<math|x<rsub|1>>*>|<cell|7>|<cell|7>|<cell|8>|<cell|10>|<cell|8>|<cell|8>|<cell|0.575(0.002)>>|<row|<cell|<math|x<rsub|6>>*>|<cell|8>|<cell|9>|<cell|4>|<cell|->|<cell|10>|<cell|2>|<cell|0.427(0.002)>>|<row|<cell|<math|x<rsub|7>>>|<cell|->|<cell|->|<cell|3>|<cell|->|<cell|->|<cell|4>|<cell|0.290(0.001)>>|<row|<cell|<math|x<rsub|8>>>|<cell|1>|<cell|->|<cell|1>|<cell|->|<cell|->|<cell|1>|<cell|0.297(0.001)>>|<row|<cell|<math|x<rsub|10>>*>|<cell|3>|<cell|5>|<cell|4>|<cell|->|<cell|2>|<cell|2>|<cell|0.292(0.001)>>|<row|<cell|<math|x<rsub|1>*x<rsub|1>>*>|<cell|10>|<cell|10>|<cell|10>|<cell|10>|<cell|10>|<cell|10>|<cell|1.000(<math|\<less\>>0.001)>>|<row|<cell|<math|x<rsub|1>*x<rsub|4>>>|<cell|3>|<cell|3>|<cell|2>|<cell|1>|<cell|3>|<cell|2>|<cell|0.497(0.002)>>|<row|<cell|<math|x<rsub|2>*x<rsub|8>>*>|<cell|9>|<cell|8>|<cell|4>|<cell|->|<cell|->|<cell|->|<cell|0.184(0.001)>>|<row|<cell|<math|x<rsub|4>*x<rsub|4>>>|<cell|->|<cell|->|<cell|1>|<cell|->|<cell|->|<cell|1>|<cell|0.266(0.001)>>|<row|<cell|<math|x<rsub|4>*x<rsub|6>>*>|<cell|8>|<cell|9>|<cell|1>|<cell|->|<cell|10>|<cell|2>|<cell|0.418(0.002)>>|<row|<cell|<math|x<rsub|4>*x<rsub|7>>>|<cell|2>|<cell|1>|<cell|5>|<cell|->|<cell|->|<cell|5>|<cell|0.337(0.001)>>|<row|<cell|<math|x<rsub|4>*x<rsub|8>>>|<cell|->|<cell|2>|<cell|3>|<cell|->|<cell|1>|<cell|3>|<cell|0.303(0.001)>>|<row|<cell|<math|x<rsub|4>*x<rsub|10>>>|<cell|6>|<cell|3>|<cell|4>|<cell|->|<cell|2>|<cell|4>|<cell|0.309(0.001)>>|<row|<cell|<math|x<rsub|5>*x<rsub|5>>*>|<cell|10>|<cell|9>|<cell|9>|<cell|->|<cell|->|<cell|->|<cell|0.261(0.001)>>|<row|<cell|<math|x<rsub|5>*x<rsub|7>>>|<cell|->|<cell|1>|<cell|->|<cell|->|<cell|->|<cell|->|<cell|0.156(0.001)>>|<row|<cell|<math|x<rsub|6>*x<rsub|6>>>|<cell|->|<cell|->|<cell|->|<cell|->|<cell|->|<cell|2>|<cell|0.218(0.001)>>|<row|<cell|<math|x<rsub|6>*x<rsub|7>>>|<cell|1>|<cell|2>|<cell|7>|<cell|10>|<cell|3>|<cell|7>|<cell|0.614(0.002)>>|<row|<cell|<math|x<rsub|6>*x<rsub|8>>>|<cell|->|<cell|->|<cell|->|<cell|->|<cell|->|<cell|6>|<cell|0.328(0.001)>>|<row|<cell|<math|x<rsub|6>*x<rsub|10>>>|<cell|1>|<cell|1>|<cell|2>|<cell|->|<cell|->|<cell|2>|<cell|0.237(0.001)>>|<row|<cell|<math|x<rsub|7>*x<rsub|7>>*>|<cell|7>|<cell|7>|<cell|2>|<cell|->|<cell|6>|<cell|2>|<cell|0.372(0.002)>>|<row|<cell|<math|x<rsub|7>*x<rsub|8>>>|<cell|1>|<cell|2>|<cell|2>|<cell|->|<cell|2>|<cell|2>|<cell|0.479(0.002)>>|<row|<cell|<math|x<rsub|7>*x<rsub|10>>*>|<cell|9>|<cell|9>|<cell|8>|<cell|10>|<cell|9>|<cell|8>|<cell|0.623(0.002)>>|<row|<cell|<math|x<rsub|9>*x<rsub|9>>*>|<cell|10>|<cell|10>|<cell|10>|<cell|10>|<cell|10>|<cell|10>|<cell|0.966(0.001)>>|<row|<cell|>|<cell|Number
    of different variables>|<cell|>|<cell|>|<cell|>|<cell|>|<cell|>|<cell|>>|<row|<cell|>|<cell|17>|<cell|18>|<cell|20>|<cell|6>|<cell|13>|<cell|20>|<cell|>>>>>
    ><label|oz65incprob>>
  </big-table|<with|font-size|0.84|For the Ozone65 dataset, the number of
  times each covariate is included in the estimated HPM and the estimated MPM
  in ten independent runs of <math|n=100000> iterations of
  <with|font-family|tt|Freq>, <with|font-family|tt|BAS>, and
  <with|font-family|tt|SSBM> methods. Asterisks identify the best model
  encountered in the full experiment. Also, <math|<wide|q|^><rsub|l>>'s are
  the estimation of the inclusion probabilities from the first run of
  <with|font-family|tt|Freq> and <math|<wide|V|^><around|(|<wide|q|^><rsub|l>|)>>
  is their estimated variance (<reference|var.tau>) using this same run.>>

  <section|Summary and main conclusions><label|sec.Ext>

  In the context of Bayesian model selection with very large model spaces
  <math|\<cal-M\>>, quantities of interest <math|\<tau\>> in the problem have
  to be estimated since their exact value is, in practice, unknown. This is
  mainly because the underlying normalizing constant, whose determination
  would imply the computation of Bayes factors for all the models, is
  virtually unknown. In this situation, such estimates have to be constructed
  from a sample of models <math|\<cal-M\><rsup|\<ast\>>> of <math|\<cal-M\>>
  and can be derived either by using the empirical distribution or through
  renormalization of the Bayes factors. Within the first approach,
  <math|\<cal-M\><rsup|\<ast\>>> has to be a sample from the posterior
  distribution and is usually obtained with MCMC sampling. In the second
  approach, <math|\<cal-M\><rsup|\<ast\>>> does not necessarily have to be
  obtained with probabilistic-based mechanisms and the emphasis is normally
  placed on sampling good models. We labeled each of these approaches
  <with|font-shape|italic|empirical> and <with|font-shape|italic|re-normalized>.
  We have shown that <with|font-shape|italic|empirical> estimates are in
  general, under the common assumptions in MCMC sampling, unbiased. Also, the
  uncertainty regarding the estimations can be easily derived, making the
  <with|font-shape|italic|empirical> approach very appealing.

  We have compared several methods in a moderate to large problem with
  <math|p=35> covariates for which we have derived the exact answers, and on
  a larger problem with <math|p=65> with an unknown solution. With respect to
  sampling good models and in particular the highest posterior probability
  model, (a problem for which the normalizing constant is not needed),
  <with|font-shape|italic|empirical> and <with|font-shape|italic|re-normalized>
  methods behave quite similarly. Nevertheless, in the estimation of other
  important parameters like the inclusion probabilities,
  <with|font-shape|italic|re-normalized> methods can be strongly biased.

  <section*|Acknowledgements>

  We want to thank Rafael Espinosa at the supercomputing center in the
  Institute for Research in Information Technology at the Universidad de
  Castilla-La Mancha for providing us with technical support. This work has
  been partially funded by a project granted by the Spanish Ministry of
  Science and Education coded MTM2010-19528.

  <section|Gibbs sampling algorithm for hierarchical
  <math|g>-priors><label|ApGS>

  Once the parameters <math|<n|\<beta\>><rsub|\<gamma\>>,\<sigma\>> have been
  analytically integrated out (see <reference|gBF>), the only unknown
  parameters in the problem are <math|g> and the components in
  <math|<n|\<gamma\>>>. Those have full conditional distributions:

  <\equation>
    <label|FullCH>\<gamma\><rsub|i>\<mid\>\<gamma\><rsub|1>,\<ldots\>,\<gamma\><rsub|i-1>,\<gamma\><rsub|i+1>,\<ldots\>,\<gamma\><rsub|p>,g,<n|y>\<sim\><text|Bernoulli><around|(|p<rsub|i>|)>,
  </equation>

  where

  <\equation*>
    p<rsub|i>=<frac|B<rsub|a*0><around|(|g|)>*\<pi\>*<around|(|g\<mid\><n|\<gamma\>>=<n|a>|)>*P*r<around|(|M<rsub|a>|)>|B<rsub|a*0><around|(|g|)>*\<pi\>*<around|(|g\<mid\><n|\<gamma\>>=<n|a>|)>*P*r<around|(|M<rsub|a>|)>+B<rsub|b*0><around|(|g|)>*\<pi\>*<around|(|g\<mid\><n|\<gamma\>>=<n|b>|)>*P*r<around|(|M<rsub|b>|)>>,
  </equation*>

  where

  <\equation*>
    <n|a>=<around|(|\<gamma\><rsub|1>,\<ldots\>,\<gamma\><rsub|i-1>,1,\<gamma\><rsub|i+1>,\<ldots\>,\<gamma\><rsub|p>|)>,
  </equation*>

  and

  <\equation*>
    <n|b>=<around|(|\<gamma\><rsub|1>,\<ldots\>,\<gamma\><rsub|i-1>,0,\<gamma\><rsub|i+1>,\<ldots\>,\<gamma\><rsub|p>|)>.
  </equation*>

  The full conditional for <math|g> is

  <\equation*>
    f<around|(|g\<mid\><n|\<gamma\>>,<n|y>|)>\<propto\>B<rsub|\<gamma\>*0><around|(|g|)>*<space|0.17em>\<pi\><around|(|g\<mid\><n|\<gamma\>>|)>,
  </equation*>

  which can easily be sampled via Metropolis-Hastings with an obvious
  proposal: <math|g<rsup|\<ast\>>\<sim\>\<pi\><around|(|g\<mid\><n|\<gamma\>>|)>>.
  The case for <math|g>-priors and any other prior that leads to a closed
  expression is a particular case of the above with
  <math|\<pi\><around|(|g\<mid\><n|\<gamma\>>|)>=1> and the step for <math|g>
  is not used.
</body>