<TeXmacs|1.99.7>

<style|<tuple|ieeetran|std-latex>>

<\body>
  <\hide-preamble>
    <new-theorem|definition|Definition>

    <new-theorem|proposition|Proposition>

    <new-theorem|example|Example>

    <new-theorem|remark|Remark>

    <new-theorem|note|Note>

    <assign|Define|<macro|<above|=|\<triangle\>>>>

    <assign|inprob|<macro|<above|\<longrightarrow\>|p>>>

    <new-theorem|thm|Theorem>

    <new-theorem|cor|Corollary>

    <new-theorem|lem|Lemma>
  </hide-preamble>

  <doc-data|<doc-title|<with|font-size|1.68|Low-Complexity
  Detection/Equalization in Large-Dimension MIMO-ISI Channels Using Graphical
  Models>>|<doc-author|<author-data|<author-name|<with|font-size|1.19|Pritam
  Som, Tanumay Datta, N. Srinidhi, A. Chockalingam, and B. Sundar
  Rajan><next-line>Department of ECE, Indian Institute of Science,
  Bangalore-560012, India.>>>|<doc-date|<date|>>>

  <abstract-data|<\abstract>
    In this paper, we deal with low-complexity near-optimal
    detection/equalization in large-dimension multiple-input multiple-output
    inter-symbol interference (MIMO-ISI) channels using message passing on
    graphical models. A key contribution in the paper is the demonstration
    that near-optimal performance in MIMO-ISI channels with large dimensions
    can be achieved at low complexities through simple yet effective
    simplifications/approximations, although the graphical models that
    represent MIMO-ISI channels are fully/densely connected (loopy graphs).
    These include 1) use of Markov Random Field (MRF) based graphical model
    with pairwise interaction, in conjunction with
    <with|font-shape|italic|message/belief damping>, and 2) use of Factor
    Graph (FG) based graphical model with <with|font-shape|italic|Gaussian
    approximation of interference> (GAI). The per-symbol complexities are
    <math|O*<around|(|K<rsup|2>*n<rsub|t><rsup|2>|)>> and
    <math|O*<around|(|K*n<rsub|t>|)>> for the MRF and the FG with GAI
    approaches, respectively, where <math|K> and <math|n<rsub|t>> denote the
    number of channel uses per frame, and number of transmit antennas,
    respectively. These low-complexities are quite attractive for large
    dimensions, i.e., for large <math|K*n<rsub|t>>. From a performance
    perspective, these algorithms are even more interesting in
    large-dimensions since they achieve increasingly closer to optimum
    detection performance for increasing <math|K*n<rsub|t>>. Also, we show
    that these message passing algorithms can be used in an iterative manner
    with local neighborhood search algorithms to improve the
    reliability/performance of <math|M>-QAM symbol detection.
  </abstract>>

  <markboth|Pritam Som <change-case|<with|font-shape|italic|et al.>|locase>:
  Low-Complexity Detection/Equalization in Large-Dimension MIMO-ISI Channels
  Using Graphical Models |Pritam Som <change-case|<with|font-shape|italic|et
  al.>|locase>: Low-Complexity Detection in Large-Dimension MIMO-ISI Channels
  Using Graphical Models >

  <section|Introduction><label|sec1>

  Signaling in large dimensions can offer attractive benefits in wireless
  communications. For example, transmission of signals using large spatial
  dimensions in multiple-input multiple-output (MIMO) systems with large
  number of transmit/receive antennas can offer increased spectral
  efficiencies <cite|fosc98>-<cite|paulraj>. The spectral efficiency in a
  V-BLAST MIMO system is <math|n<rsub|t>> symbols per channel use, where
  <math|n<rsub|t>> is the number of transmit antennas <cite|paulraj>.
  Severely delay-spread inter-symbol interference (ISI) channels can offer
  opportunities to harness rich diversity benefits <cite|proakis>. In an
  <math|L>-length ISI channel, each symbol in a frame is interfered by its
  previous <math|L-1> symbols. However, the availability of <math|L> copies
  of the transmitted signal in ISI channels can be exploited to achieve
  <math|L>th order diversity. A way to achieve this diversity is to organize
  data into frames, where each frame consists of <math|K> channel uses (i.e.,
  <math|K> dimensions in time), <math|K\<gtr\>L>, and carry out joint
  detection/equalization over the entire frame at the receiver. A MIMO-ISI
  channel with large <math|K*n<rsub|t>> and <math|L> (referred to as
  large-dimension MIMO-ISI channel) is of interest because of its potential
  to offer high spectral efficiencies (in large <math|n<rsub|t>>) and
  diversity orders (in large <math|L><footnote|A practical example of
  severely delay-spread ISI channel with large <math|L> is an ultra wideband
  (UWB) channel <cite|ngoc>. UWB channels are highly frequency-selective, and
  are characterized by severe ISI due to large delay spreads
  <cite|uwb0>-<cite|uwb3>. The number of multipath components (MPC) in such
  channels in indoor/industrial environments has been observed to be of the
  order of several tens to hundreds; number of MPCs ranging from 12 to 120
  are common in UWB channel models <cite|uwb0>,<cite|uwb3>.>). A major
  challenge, however, is detection complexity. The complexity of optimum
  detection is exponential in number of dimensions, which is prohibitive for
  large number of dimensions. Our focus in this paper is to achieve
  near-optimal detection performance in large dimensions at low complexities.
  A powerful approach to realize this goal, which we investigate in this
  paper, is message passing on graphical models.

  Graphical models are graphs that indicate inter-dependencies between random
  variables <cite|frey>. Well known graphical models include Bayesian belief
  networks, factor graphs, and Markov random fields <cite|merl>. Belief
  propagation (BP) is a technique that solves inference problems using
  graphical models <cite|merl>. BP is a simple, yet highly effective,
  technique that has been successfully employed in a variety of applications
  including computational biology, statistical signal/image processing, data
  mining, etc. BP is well suited in several communication problems as well
  <cite|frey>; e.g., decoding of turbo codes and LDPC codes
  <cite|bp_turbo>,<cite|ldpc>, multiuser detection in CDMA
  <cite|bpmud0>-<cite|bpmud2>, and MIMO detection <cite|ieee06>-<cite|itw10>.

  Turbo equalization which performs detection/equalization and decoding in an
  iterative manner in coded data transmission over ISI channels have been
  widely studied <cite|douil_95>,<cite|turbo_eq>,<cite|teq_mag>. More
  recently, message passing on factor graphs based graphical models
  <cite|fg_sp> have been studied for detection/equalization on ISI channels
  <cite|euro_04>-<cite|wo>. In <cite|isi2>, it has been shown through
  simulations that application of sum-product (SP) algorithm to factor graphs
  in ISI channels converges to a good approximation of the exact a posteriori
  probability (APP) of the transmitted symbols. In <cite|fg_eq>, the problem
  of finding the linear minimum mean square error (LMMSE) estimate of the
  transmitted symbol sequence is addressed employing a factor graph
  framework. Equalization in MIMO-ISI channels using factor graphs are
  investigated in <cite|mimo_isi>,<cite|wo>. In <cite|mimo_isi>, variable
  nodes of the factor graph correspond to the transmitted symbols, and each
  channel use corresponds to a function node. Since the received signal at
  any channel use depends on the past <math|L> symbols transmitted from every
  transmit antenna, every function node is connected to <math|L*n<rsub|t>>
  variable nodes. Near-MAP (maximum a posteriori probability) performance was
  shown through simulations for <math|n<rsub|t>=2> systems. However, the
  complexities involved in the computation of messages at the variable and
  function nodes are exponential in <math|L*n<rsub|t>>, which are prohibitive
  for large spatial dimensions and delay spreads. In <cite|wo>, a Gaussian
  approximation of interference is used which significantly reduced the
  complexity to scale well for large <math|L>. However, in terms of
  performance, the algorithm in <cite|wo> exhibited high error
  floors<footnote|Figure <reference|fig13a> shows an error floor in the
  approach in <cite|wo>. Whereas, in the same figure, our FG approach in Sec.
  <reference|sec4> is seen to avoid flooring and perform significantly
  better.>.

  Our key contribution in this paper is the demonstration that graphical
  models can be effectively used to achieve
  <with|font-shape|italic|near-optimal> detection/equalization performance in
  <with|font-shape|italic|large-dimension> MIMO-ISI channels at
  <with|font-shape|italic|low complexities>. The achieved performance is good
  because detection is performed jointly over the entire frame of data; i.e.,
  over the full <math|K*n<rsub|t>\<times\>1> data vector. While simple
  approximations/simplifications resulted in low complexities, the
  <with|font-shape|italic|large-dimension behavior><footnote|We say that an
  algorithm exhibits `large-dimension behavior' if its bit error performance
  improves with increasing number of dimensions. The fact that turbo codes
  with BP decoding achieve near-capacity performance only when the
  <with|font-shape|italic|frame sizes are large> is an instance of
  large-dimension behavior.> natural in message passing algorithms
  contributed to the near-optimal performance in large dimensions. The
  graphical models we consider in this paper are Markov random fields (MRF)
  and factor graphs (FG). We show that these graphical models based
  algorithms perform increasingly closer to the optimum performance for
  increasing <math|n<rsub|t>> and increasing values of <math|K> and <math|L>,
  keeping <math|L/K> fixed.

  In the case of MRF approach (Section <reference|sec3>), we show that the
  use of <with|font-shape|italic|damping> of messages/beliefs, where
  messages/beliefs are computed as a weighted average of the message/belief
  in the previous iteration and the current iteration (details and associated
  references given in Section <reference|sec>), is instrumental in achieving
  good performance. Simulation results show that the MRF approach exhibits
  large-dimension behavior, and that damping significantly improves the bit
  error performance (details given in Section <reference|sec>). For example,
  the MRF based algorithm with message damping achieves close to unfaded
  single-input single-output (SISO) AWGN performance (which is a lower bound
  on the optimum detector performance) within 0.25 dB at <math|10<rsup|-3>>
  bit error rate (BER) in a MIMO-ISI channel with
  <math|n<rsub|t>=n<rsub|r>=4>, <math|K=100> channel uses per frame (i.e.,
  problem size is <math|K*n<rsub|t>=400> dimensions), and <math|L=20>
  equal-energy multipath components (MPC). Similar performances are shown for
  large-MIMO systems with <math|n<rsub|t>=n<rsub|r>=16,32> and <math|K=64>
  (problem size <math|K*n<rsub|t>=1024> and 2048 dimensions). The per-symbol
  complexity of the MRF approach is <math|O*<around|(|K<rsup|2>*n<rsub|t><rsup|2>|)>>
  (details in Section <reference|sec>).

  In the case of FG approach (Section <reference|sec4>), the Gaussian
  approximation of interference (GAI) we adopt is found to be effective to
  further reduce the complexity by an order (Section <reference|sec>); i.e.,
  the per-symbol complexity of the FG with GAI approach is just
  <math|O*<around|(|K*n<rsub|t>|)>>, which is one order less than that of the
  MRF approach. The proposed FG with GAI approach is also shown to exhibit
  large-dimension behavior; its BER performance is almost the same as that of
  the MRF approach, and is significantly better than that of the scheme in
  <cite|wo> (Section <reference|sec>). We also show that the proposed FG with
  GAI algorithm can be used in an iterative manner with local neighborhood
  search algorithms, like the reactive tabu search (RTS) algorithm in
  <cite|isi_gcom09>, to improve the performance of <math|M>-QAM detection
  (Section <reference|sec5>).

  Though the proposed algorithms are presented in the context of uncoded
  systems, they can be extended to coded systems as well, through turbo
  equalization <cite|douil_95>-<cite|teq_mag> <left|(>Receiver C in Fig. 1 of
  <cite|teq_mag><right|)> or through joint processing of the entire coded
  frame using low-complexity graphical models <left|(>low-complexity
  approximations of Receiver A in Fig. 1 of <cite|teq_mag><right|)>. In
  <cite|bp_isit09>, we have investigated a scheme with separate MRF based
  detection followed by decoding <left|(>Receiver B is Fig. 1 of
  <cite|teq_mag><right|)> in a <math|24\<times\>24> large-MIMO system, and
  showed that a coded BER performance close to within 2.5 dB of the
  theoretical ergodic MIMO capacity is achieved. MIMO space-time coding
  schemes that can achieve separability of detection and decoding without
  loss of optimality <cite|blld> are interesting because they avoid the need
  for joint processing for optimal detection and decoding. If such
  detection-decoding separable space-time codes become available for large
  dimensions, the proposed algorithms can be applicable in their
  detection/equalization.

  The rest of the paper is organized as follows. In Section <reference|sec2>,
  we present the considered MIMO system model in frequency selective fading.
  In Section <reference|sec3>, we present the proposed MRF based BP detector
  with damping and its BER performance in large dimensions. Section
  <reference|sec4> presents the FG with GAI based BP detector and its BER
  performance. In Section <reference|sec5>, the proposed hybrid RTS-BP
  algorithm for detection of <math|M>-QAM signals and its performance are
  presented. Conclusions are presented in Section <reference|sec6>.

  <section|System Model><label|sec2>

  We consider MIMO systems with cyclic prefixed single-carrier (CPSC)
  signaling, where the overall MIMO channel includes an FFT operation so that
  the transmitted symbols are estimated from the received frequency-domain
  signal (also referred to as SC-FDE: single-carrier modulation with
  frequency-domain equalization) <cite|cpsc1>-<cite|cpsc3>. Unlike OFDM
  signaling, CPSC signaling does not suffer from the peak to average power
  ratio (PAPR) problem. Also, CPSC with FD-MMSE equalizer performs better
  than OFDM at large frame sizes (large <math|K>) <cite|cpsc3>. We will see
  that our proposed BP based algorithms scale well for large dimensions in
  MIMO-CPSC schemes (large <math|K*n<rsub|t>>) and perform significantly
  better than MIMO-CPSC with FD-MMSE equalizer as well as MIMO-OFDM with
  MMSE/ML equalizer.

  <big-figure|<image|fig1.eps|3.45in|1.85in||><label|fig1>|MIMO-ISI Channel
  Model.>

  Consider a frequency-selective MIMO channel with <math|n<rsub|t>> transmit
  and <math|n<rsub|r>> receive antennas as shown in Fig. <reference|fig1>.
  Let <math|L> denote the number of multipath components (MPC). Data is
  transmitted in frames, where each frame has <math|K<rprime|'>> channel
  uses, out of which data symbol vectors are sent in <math|K> channel uses
  <math|K\<geq\>L>. These <math|K> channel uses are preceded by a cyclic
  prefix (CP) of length <math|L-1> channel uses so that
  <math|K<rprime|'>=K+L-1>. In each channel use, an <math|n<rsub|t>>-length
  data symbol vector is transmitted using spatial multiplexing on
  <math|n<rsub|t>> transmit antennas. Let
  <math|<with|font-series|bold|x><rsub|q>\<in\>{\<pm\>1}<rsup|n<rsub|t>>>
  denote the data symbol vector transmitted in the <math|q>th channel use,
  <math|q=0,1,\<cdots\>,K-1>. Though the symbol alphabet used here is BPSK,
  extensions to higher-order alphabet are possible, and some are discussed
  later in the paper. While CP avoids inter-frame interference, there will be
  ISI within the frame. The received signal vector at time <math|q> can be
  written as

  <eqnarray|<tformat|<table|<row|<cell|<math-bf|y><rsub|q>>|<cell|=>|<cell|<big|sum><rsub|l=0><rsup|L-1><math-bf|H><rsub|l><space|0.17em><math-bf|x><rsub|q-l>+<math-bf|w><rsub|q>,<space|1em><space|0.17em><space|0.17em>q=0,\<cdots\>,K-1,<eq-number><label|eq1>>>>>>

  where <math|<math-bf|y><rsub|q>\<in\>\<bbb-C\><rsup|n<rsub|r>>>,
  <math|<math-bf|H><rsub|l>\<in\>\<bbb-C\><rsup|n<rsub|r>\<times\>n<rsub|t>>>
  is the channel gain matrix for the <math|l>th MPC such that
  <math|H<rsub|j,i><rsup|<around|(|l|)>>> denotes the entry on the <math|j>th
  row and <math|i>th column of the <math|<math-bf|H><rsub|l>> matrix, i.e.,
  <math|H<rsub|j,i><rsup|<around|(|l|)>>> is the channel from <math|i>th
  transmit antenna to the <math|j>th receive antenna on the <math|l>th MPC.
  The entries of <math|<math-bf|H><rsub|l>> are assumed to be i.i.d
  <math|\<bbb-C\>*\<cal-N\><around|(|0,1|)>>. It is further assumed that
  <math|<math-bf|H><rsub|l>>, <math|l=0,\<cdots\>,L-1> remain constant for
  one frame duration, and vary i.i.d from one frame to the other.
  <math|<math-bf|w><rsub|q>\<in\>\<bbb-C\><rsup|n<rsub|r>>> is the additive
  white Gaussian noise vector at time <math|q>, whose entries are
  independent, each with variance <math|\<sigma\><rsup|2>=n<rsub|t>*L*E<rsub|s>/\<gamma\>>,
  where <math|\<gamma\>> is the average received SNR per received antenna.
  The CP will render the linearly convolving channel to a circularly
  convolving one, and so the channel will be multiplicative in frequency
  domain. Because of the CP, the received signal in frequency domain, for the
  <math|i>th frequency index (<math|0\<leq\>i\<leq\>K-1>), can be written as

  <eqnarray|<tformat|<table|<row|<cell|<math-bf|r><rsub|i>>|<cell|=>|<cell|<math-bf|G><rsub|i><math-bf|u><rsub|i>+<math-bf|v><rsub|i>,<eq-number><label|eq3>>>>>>

  where <math|<math-bf|r><rsub|i>=<frac|1|<sqrt|K>>*<big|sum><rsub|q=0><rsup|K-1>e<rsup|<frac|-2*\<pi\><math-bf|j>q*i|K>><math-bf|y><rsub|q>,<space|0.17em><space|0.17em>>
  <with|font-size|1|mode|math|<math-bf|u><rsub|i>=<frac|1|<sqrt|K>>*<big|sum><rsub|q=0><rsup|K-1>e<rsup|<frac|-2*\<pi\><math-bf|j>q*i|K>><math-bf|x><rsub|q>,<space|0.17em><space|0.17em>>
  <math|<math-bf|v><rsub|i>=<frac|1|<sqrt|K>>*<big|sum><rsub|q=0><rsup|K-1>e<rsup|<frac|-2*\<pi\><math-bf|j>q*i|K>><math-bf|w><rsub|q>,<space|0.17em><space|0.17em>>
  <math|<math-bf|G><rsub|i>=<big|sum><rsub|l=0><rsup|L-1>e<rsup|<frac|-2*\<pi\><math-bf|j>l*i|K>><math-bf|H><rsub|l>>,
  and <math|<math-bf|j>=<sqrt|-1>>. Stacking the <math|K> vectors
  <math|<math-bf|r><rsub|i>>, <math|i=0,\<cdots\>,K-1>, we write

  <eqnarray|<tformat|<table|<row|<cell|<math-bf|r>>|<cell|=>|<cell|<wide*|<math-bf|GF>|\<wide-underbrace\>><rsub|<Define><space|0.17em><space|0.17em><math-bf|H><rsub|e*f*f>><math-bf|x><rsub|e*f*f>+<math-bf|v><rsub|e*f*f>,<eq-number><label|eqChannelModel>>>>>>

  where

  <\equation*>
    <math-bf|r>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|<math-bf|r><rsub|0>>>|<row|<cell|<math-bf|r><rsub|1>>>|<row|<cell|\<vdots\>>>|<row|<cell|<math-bf|r><rsub|<with|font-size|0.84|K-1>>>>>>><space|-2mm>|]><with|math-font-family|rm|,><space|1em><math-bf|G>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|<math-bf|G><rsub|0>>|<cell|>>|<row|<cell|>|<cell|<math-bf|G><rsub|1>>>>>>>|<cell|<with|math-font-family|bf|0>>>|<row|<cell|<with|math-font-family|bf|0>>|<cell|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|\<ddots\>>|<cell|>>|<row|<cell|>|<cell|<math-bf|G><rsub|<with|font-size|0.84|K-1>>>>>>>>>>>><space|-3mm>|]><with|math-font-family|rm|,><space|1em>
  </equation*>

  <\equation*>
    <math-bf|x><rsub|e*f*f>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|<math-bf|x><rsub|0>>>|<row|<cell|<math-bf|x><rsub|1>>>|<row|<cell|\<vdots\>>>|<row|<cell|<math-bf|x><rsub|<with|font-size|0.84|K-1>>>>>>><space|-2mm>|]><with|math-font-family|rm|,><space|1em><math-bf|v><rsub|e*f*f>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|<math-bf|v><rsub|0>>>|<row|<cell|<math-bf|v><rsub|1>>>|<row|<cell|\<vdots\>>>|<row|<cell|<math-bf|v><rsub|<with|font-size|0.84|K-1>>>>>>>|]>,
  </equation*>

  <eqnarray*|<tformat|<table|<row|<cell|<math-bf|F>>|<cell|<space|-2mm>=>|<cell|<space|-2mm><frac|1|<sqrt|K>><around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|3|3|cell-halign|l>|<cwith|1|-1|4|4|cell-halign|l>|<cwith|1|-1|4|4|cell-rborder|0ln>|<table|<row|<cell|\<rho\><rsub|<with|font-size|0.84|0,0>><math-bf|I><rsub|n<rsub|t>>>|<cell|\<rho\><rsub|<with|font-size|0.84|1,0>><math-bf|I><rsub|n<rsub|t>>>|<cell|\<cdots\>>|<cell|\<rho\><rsub|<with|font-size|0.84|K-1,0>><math-bf|I><rsub|n<rsub|t>>>>|<row|<cell|\<rho\><rsub|<with|font-size|0.84|0,1>><math-bf|I><rsub|n<rsub|t>>>|<cell|\<rho\><rsub|<with|font-size|0.84|1,1>><math-bf|I><rsub|n<rsub|t>>>|<cell|\<cdots\>>|<cell|\<rho\><rsub|<with|font-size|0.84|K-1,1>><math-bf|I><rsub|n<rsub|t>>>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<cdots\>>|<cell|\<vdots\>>>|<row|<cell|\<rho\><rsub|<with|font-size|0.84|0,K-1>><math-bf|I><rsub|n<rsub|t>>>|<cell|\<rho\><rsub|<with|font-size|0.84|1,K-1>><math-bf|I><rsub|n<rsub|t>>>|<cell|\<cdots\>>|<cell|\<rho\><rsub|<with|font-size|0.84|K-1,K-1>><math-bf|I><rsub|n<rsub|t>>>>>>><space|-2mm>|]>>>|<row|<cell|>|<cell|=>|<cell|<frac|1|<sqrt|K>><math-bf|D><rsub|K>\<otimes\><math-bf|I><rsub|n<rsub|t>>,>>>>>

  where <math|\<rho\><rsub|q,i>=e<rsup|<frac|-2*\<pi\><math-bf|j>q*i|K>>>,
  <math|<math-bf|D><rsub|K>> is the <math|K>-point DFT matrix and
  <math|\<otimes\>> denotes the Kronecker product. Equation
  (<reference|eqChannelModel>) can be written in an equivalent linear vector
  channel model of the form

  <eqnarray|<tformat|<table|<row|<cell|<math-bf|r>>|<cell|=>|<cell|<math-bf|Hx>+<math-bf|v>,<eq-number><label|eqn1>>>>>>

  where <math|<with|font-series|bold|H>=<with|font-series|bold|H><rsub|e*f*f>>,
  <math|<with|font-series|bold|x>=<with|font-series|bold|x><rsub|e*f*f>>, and
  <math|<with|font-series|bold|v>=<math-bf|v><rsub|e*f*f>>. Note that the
  well known MIMO system model for flat fading can be obtained as a special
  case in the above system model with <math|L=K=1>.

  We further note that, in the considered system, signaling is done along
  <math|K> dimensions in time and <math|n<rsub|t>> dimensions in space, so
  that the total number of dimensions involved is <math|K*n<rsub|t>>. We are
  interested in low-complexity detection/equalization in large dimensions
  (i.e., for large <math|K*n<rsub|t>>) using graphical models. The goal is to
  obtain an estimate of vector <math|<with|font-series|bold|x>>, given
  <math|<with|font-series|bold|r>> and the knowledge of
  <math|<with|font-series|bold|H>>. The optimal maximum a posteriori
  probability (MAP) detector takes the joint posterior distribution

  <eqnarray|<tformat|<table|<row|<cell|p<around|(|<with|font-series|bold|x>\<mid\><with|font-series|bold|r>,<with|font-series|bold|H>|)>>|<cell|\<propto\>>|<cell|p<around|(|<with|font-series|bold|r>\<mid\><with|font-series|bold|x>,<with|font-series|bold|H>|)>*<space|0.17em>p<around|(|<with|font-series|bold|x>|)>,<eq-number><label|eqnd1>>>>>>

  and marginalizes out each variable as <math|p<around|(|x<rsub|i>\|<with|font-series|bold|r>,<with|font-series|bold|H>|)>=<big|sum><rsub|x<rsub|-i>>p<around|(|<with|font-series|bold|x>\|<with|font-series|bold|r>,<with|font-series|bold|H>|)>>,
  where <math|x<rsub|-i>> stands for all entries of
  <math|<with|font-series|bold|x>> except <math|x<rsub|i>>. The MAP estimate
  of the bit <math|x<rsub|i>>, <math|i=1,\<cdots\>,K*n<rsub|t>>, is then
  given by

  <eqnarray|<tformat|<table|<row|<cell|<wide|x|^><rsub|i>>|<cell|=>|<cell|<ontop|<text|arg
  max>|a\<in\>{\<pm\>1}><space|0.17em><space|0.17em>p*<around*|(|x<rsub|i>=a\<mid\><with|font-series|bold|r>,<with|font-series|bold|H>|)>,<eq-number><label|MAPdetection>>>>>>

  whose complexity is exponential in <math|K*n<rsub|t>>. In the following
  sections, we present low-complexity detection algorithms based on graphical
  models suited for the system model in (<reference|eqn1>) with large
  dimensions, i.e., for large <math|K>, <math|L>, <math|n<rsub|t>>, keeping
  <math|L/K> fixed.

  <section|Detection Using BP on Markov Random Fields ><label|sec3>

  In this section, we present a detection algorithm based on message passing
  on a MRF graphical model of the MIMO system model in (<reference|eqn1>)
  <cite|isi_icc2010>.

  <subsection|Markov Random Fields><label|sec>

  An undirected graph is given by <math|G=<around|(|V,E|)>>, where <math|V>
  is the set of nodes and <math|E\<subseteq\><around*|{|<around|(|i,j|)>:i,j\<in\>V,i\<neq\>j|}>>
  is the set of undirected edges. An MRF is an undirected graph whose
  vertices are random variables <cite|Pearl1988>,<cite|frey>. The statistical
  dependency among the variables are such that any variable is independent of
  all the other variables, given its neighbors. Usually, the variables in an
  MRF are constrained by a <with|font-shape|italic|compatibility function>,
  also known as a <with|font-shape|italic|clique potential> in literature. A
  <with|font-shape|italic|clique> of an MRF is a fully connected sub-graph,
  i.e., it is a subset <math|C\<subseteq\>V> such that
  <math|<around|(|i,j|)>\<in\>E> for all <math|i,j\<in\>C>. A clique is
  <with|font-shape|italic|maximal> if it is not a strict subset of another
  clique. Therefore, a maximal clique does not remain fully connected if any
  additional vertex of the MRF is included in it. For example, in the MRF
  shown in Fig. <reference|fig2>, <math|<around*|{|x<rsub|1>,x<rsub|2>,x<rsub|3>,x<rsub|4>|}>>
  and <math|<around*|{|x<rsub|3>,x<rsub|4>,x<rsub|5>|}>> are two maximal
  cliques.

  <big-figure|<with|par-mode|center|<image|fig2.eps|2.25in|1.30in||><label|fig2>>|An
  example of MRF.>

  Let there be <math|N<rsub|c>> maximal cliques in the MRF, and
  <math|<math-bf|x><rsub|j>> be the variables in maximal clique <math|j>. Let
  <math|\<psi\><rsub|j><around*|(|<math-bf|x><rsub|j>|)>> be the clique
  potential of clique <math|j>. Then the joint distribution of the variables
  is given by Hammersley-Clifford theorem <cite|griffeath>

  <eqnarray|<tformat|<table|<row|<cell|p<around*|(|<math-bf|x>|)>>|<cell|=>|<cell|<frac|1|Z>*<big|prod><rsub|j=1><rsup|N<rsub|c>>\<psi\><rsub|j><around*|(|<math-bf|x><rsub|j>|)>,<eq-number>>>>>>

  where <math|Z> is a constant, also known as
  <with|font-shape|italic|partition function>, chosen to ensure the
  distribution is normalized. In Fig. <reference|fig2>, with two maximal
  cliques in the MRF, namely, <math|<around*|{|x<rsub|1>,x<rsub|2>,x<rsub|3>,x<rsub|4>|}>>
  and <math|<around*|{|x<rsub|3>,x<rsub|4>,x<rsub|5>|}>>, the joint
  probability distribution is given by

  <eqnarray|<tformat|<table|<row|<cell|p<around*|(|<math-bf|x>|)>>|<cell|=>|<cell|<frac|1|Z>*<space|0.17em>\<psi\><rsub|1><around*|(|x<rsub|1>,x<rsub|2>,x<rsub|3>,x<rsub|4>|)>*\<psi\><rsub|2><around*|(|x<rsub|3>,x<rsub|4>,x<rsub|5>|)>.<eq-number>>>>>>

  <with|font-shape|italic|Pairwise MRF:> An MRF is called a
  <with|font-shape|italic|pairwise> MRF if all the maximal cliques in the MRF
  are of size two. In this case, the clique potentials are all functions of
  two variables. The joint distribution in such a case takes the form
  <cite|merl>

  <eqnarray|<tformat|<table|<row|<cell|p<around*|(|<math-bf|x>|)>>|<cell|\<propto\>>|<cell|<around*|(|<big|prod><rsub|<around|(|i,j|)>>\<psi\><rsub|i,j><around*|(|x<rsub|i>,x<rsub|j>|)>|)><around*|(|<big|prod><rsub|i>\<phi\><rsub|i><around*|(|x<rsub|i>|)>|)>,<eq-number><label|eqnz2>>>>>>

  where <math|\<psi\><rsub|i,j><around*|(|x<rsub|i>,x<rsub|j>|)>> is the
  clique potential between nodes <math|x<rsub|i>> and <math|x<rsub|j>>
  denoting the statistical dependence between them, and
  <math|\<phi\><rsub|i><around*|(|x<rsub|i>|)>> is the self potential of node
  <math|x<rsub|i>>.

  <subsection|MRF of MIMO System>

  The MRF of a MIMO system is a fully connected graph. Figure
  <reference|fig2x> shows the MRF for a <math|8\<times\>8> MIMO system. We
  get the MRF potentials for the MIMO system where the posterior probability
  function of the random vector <math|<with|font-series|bold|x>>, given
  <math|<with|font-series|bold|r>> and <math|<with|font-series|bold|H>>, is
  of the form<footnote|In our detection problem, relative values of the
  distribution for various possibilities of <math|<math-bf|x>> are adequate.
  So, we can omit the normalization constant <math|Z>, which is independent
  of <math|<with|font-series|bold|x>>, and replace the equality with
  proportionality in the distribution.>

  <eqnarray|<tformat|<table|<row|<cell|<space|-0mm>p<around|(|<with|font-series|bold|x>\<mid\><with|font-series|bold|r>,<with|font-series|bold|H>|)>>|<cell|\<propto\>>|<cell|exp
  <around*|(|<frac|-1|2*\<sigma\><rsup|2>>*<around|\<\|\|\>|<with|font-series|bold|r>-<with|font-series|bold|Hx>|\<\|\|\>><rsup|2>|)>*exp
  <around*|(|ln p<around|(|<with|font-series|bold|x>|)>|)>>>|<row|<cell|>|<cell|=>|<cell|exp
  <around*|(|-<frac|1|2*\<sigma\><rsup|2>>*<around|(|<with|font-series|bold|r>-<with|font-series|bold|Hx>|)><rsup|H>*<around|(|<with|font-series|bold|r>-<with|font-series|bold|Hx>|)>|)>>>|<row|<cell|>|<cell|>|<cell|\<cdot\><big|prod><rsub|i>exp
  <around*|(|ln p<around|(|x<rsub|i>|)>|)>>>|<row|<cell|>|<cell|\<propto\>>|<cell|exp
  <around*|(|-<frac|1|2*\<sigma\><rsup|2>>*<around*|(|<with|font-series|bold|x><rsup|H>*<with|font-series|bold|H><rsup|H>*<with|font-series|bold|Hx>-2*\<Re\>*<around|{|<with|font-series|bold|x><rsup|H>*<with|font-series|bold|H><rsup|H>*<with|font-series|bold|r>|}>|)>|)>>>|<row|<cell|>|<cell|>|<cell|\<cdot\><big|prod><rsub|i>exp
  <around*|(|ln p<around|(|x<rsub|i>|)>|)>.<eq-number><label|dist>>>>>>

  Now, defining <math|<with|font-series|bold|R><Define><frac|1|\<sigma\><rsup|2>>*<with|font-series|bold|H><rsup|H>*<with|font-series|bold|H><space|0.27em>>
  and <math|<space|0.27em><with|font-series|bold|z><Define><frac|1|\<sigma\><rsup|2>>*<with|font-series|bold|H><rsup|H>*<with|font-series|bold|r>>,
  we can write (<reference|dist>) as

  <with|font-size|0.84|<eqnarray|<tformat|<table|<row|<cell|p<around|(|<with|font-series|bold|x>\<mid\><with|font-series|bold|r>,<with|font-series|bold|H>|)>>|<cell|\<propto\>>|<cell|exp
  <around*|(|-<big|sum><rsub|i\<less\>j>\<Re\>*<around|{|x<rsub|i><rsup|\<ast\>>*R<rsub|i*j>*x<rsub|j>|}>|)>>>|<row|<cell|>|<cell|>|<cell|\<cdot\>exp
  <around*|(|<big|sum><rsub|i>\<Re\>*<around|{|x<rsub|i><rsup|\<ast\>>*z<rsub|i>|}>|)>*<big|prod><rsub|i>exp
  <around*|(|ln p<around|(|x<rsub|i>|)>|)>>>|<row|<cell|>|<cell|<space|-4.6cm>=>|<cell|<space|-2.6cm><around*|(|<big|prod><rsub|i\<less\>j><space|-0.5mm>exp
  <space|-0.5mm><around*|(|<space|-1mm>-<space|-1mm>x<rsub|i>*\<Re\><around|{|R<rsub|i*j>|}>*x<rsub|j>|)><space|-1mm>|)><space|-1mm><around*|(|<big|prod><rsub|i><space|-0.5mm>exp
  <space|-0.5mm><around*|(|x<rsub|i>*\<Re\><around|{|z<rsub|i>|}>+ln
  p<around|(|x<rsub|i>|)>|)><space|-1mm>|)><space|-1mm>,<eq-number><label|dist2>>>>>>>

  <space|-5mm>where <math|z<rsub|i>> and <math|R<rsub|i*j>> are the elements
  of <math|<with|font-series|bold|z>> and <math|<with|font-series|bold|R>>,
  respectively. Comparing (<reference|dist2>) and (<reference|eqnz2>), we see
  that the MRF of the MIMO system has only pairwise interactions with the
  following potentials

  <eqnarray|<tformat|<table|<row|<cell|\<psi\><rsub|i,j><around*|(|x<rsub|i>,x<rsub|j>|)>>|<cell|=>|<cell|exp
  <around*|(|<space|-1mm>-<space|-1mm>x<rsub|i>*\<Re\><around|{|R<rsub|i*j>|}>*x<rsub|j>|)>,<eq-number><label|psix1>>>|<row|<cell|\<phi\><rsub|i><around*|(|x<rsub|i>|)>>|<cell|=>|<cell|exp
  <around*|(|x<rsub|i>*\<Re\><around|{|z<rsub|i>|}>+ln
  p<around|(|x<rsub|i>|)>|)>.<eq-number><label|phix1>>>>>>

  <big-figure|<with|par-mode|center|<image|fig3.eps|2.25in|2.25in||><label|fig2x>>|Fully
  connected MRF of <math|8\<times\>8> MIMO system.>

  <subsection|Message Passing><label|sec>

  The values of <math|\<psi\>> and <math|\<phi\>> given by
  (<reference|psix1>) and (<reference|phix1>) define, respectively, the edge
  and self potentials of an undirected graphical model to which message
  passing algorithms, such as belief propagation (BP), can be applied to
  compute the marginal probabilities of the variables. BP attempts to
  estimate the marginal probabilities of all the variables by way of passing
  messages between the local nodes.

  A <with|font-shape|italic|message> from node <math|j> to node <math|i> is
  denoted as <math|m<rsub|j,i><around*|(|x<rsub|i>|)>>, and belief at node
  <math|i> is denoted as b<math|<rsub|i><around|(|x<rsub|i>|)>>,
  <math|x<rsub|i>\<in\>{\<pm\>1}>. The b<math|<rsub|i><around|(|x<rsub|i>|)>>
  is proportional to how likely <math|x<rsub|i>> was transmitted. On the
  other hand, <math|m<rsub|j*i><around|(|x<rsub|i>|)>> is proportional to how
  likely <math|x<rsub|j>> thinks <math|x<rsub|i>> was transmitted. The belief
  at node <math|i> is

  <eqnarray|<tformat|<table|<row|<cell|<text|b><rsub|i><around*|(|x<rsub|i>|)>>|<cell|\<propto\>>|<cell|\<phi\><rsub|i><around*|(|x<rsub|i>|)>*<big|prod><rsub|j\<in\>\<cal-N\><around*|(|i|)>>m<rsub|j,i><around*|(|x<rsub|i>|)>,<eq-number><label|eqnmrfbelief>>>>>>

  where <math|\<cal-N\><around|(|i|)>> denotes the neighboring nodes of node
  <math|i>, and the messages are defined as <cite|merl>

  <eqnarray|<tformat|<table|<row|<cell|<space|-7mm>m<rsub|j,i><around*|(|x<rsub|i>|)>>|<cell|<space|-1mm>\<propto\>>|<cell|<space|-1mm><big|sum><rsub|x<rsub|j>>\<phi\><rsub|j><around*|(|x<rsub|j>|)>*\<psi\><rsub|j,i><around*|(|x<rsub|j>,x<rsub|i>|)>*<big|prod><rsub|k\<in\>\<cal-N\><around*|(|j|)>\<setminus\>i><space|-2mm>m<rsub|k,j><around*|(|x<rsub|j>|)>.<eq-number><label|eqnmsgdefn>>>>>>

  Equation (<reference|eqnmsgdefn>) actually constitutes an iteration, as the
  message is defined in terms of the other messages. So, BP essentially
  involves computing the outgoing messages from a node to each of its
  neighbors using the local joint compatibility function and the incoming
  messages and transmitting them. The algorithm terminates after a fixed
  number of iterations.

  <subsection|Improvement through Damping><label|sec>

  In systems characterized by fully/highly connected graphical models, BP
  based algorithms may fail to converge, and if they do converge, the
  estimated marginals may be far from exact <cite|mooij3>,<cite|mooij2>. It
  may be expected that BP might perform poorly in MIMO graphs due to the high
  density of connections. However, several methods are known in the
  literature, including <with|font-shape|italic|double loop methods>
  <cite|Heskes>,<cite|yuille> and <with|font-shape|italic|damping>
  <cite|damp>,<cite|loopybp6> which can be applied to improve things if BP
  does not converge (or converges too slowly). In this paper, we consider
  damping methods.

  In <cite|damp>, Pretti proposed a modified version of BP with over-relaxed
  BP dynamics. At each step of the algorithm, the evaluation of messages is
  taken to be a weighted average between the old estimate and the new
  estimate. The weighted average could either be applied to the messages
  (resulting in <with|font-shape|italic|message damped BP>) or to the
  estimate of the probability distribution/beliefs of the variables
  (<with|font-shape|italic|probability/belief damped BP>), or to both
  messages and beliefs (<with|font-shape|italic|hybrid damped BP>). It is
  shown, in <cite|damp>, that the probability damped BP can be derived as a
  limit case in which the double-loop algorithm becomes a single-loop one.

  <with|font-shape|italic|Message Damped BP:> Denoting
  <math|<wide|m|~><rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>> as
  the updated message in iteration <math|t> obtained by message passing, the
  new message from node <math|i> to node <math|j> in iteration <math|t>,
  denoted by <math|m<rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>>,
  is computed as a convex combination of the old message and the updated
  message as

  <eqnarray|<tformat|<table|<row|<cell|<space|-7.5mm><wide|m|~><rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>>|<cell|<space|-2mm>\<propto\>>|<cell|<space|-2mm><big|sum><rsub|x<rsub|i>>\<phi\><rsub|i><around*|(|x<rsub|i>|)>*\<psi\><rsub|i,j><around*|(|x<rsub|i>,x<rsub|j>|)>*<space|-1mm><big|prod><rsub|k\<in\>\<cal-N\><around*|(|i|)>\<setminus\>j><space|-3mm>m<rsub|k,i><rsup|<around|(|t-1|)>><around*|(|x<rsub|i>|)>,<eq-number><label|eqn>>>>>>

  <eqnarray|<tformat|<table|<row|<cell|<space|-4mm>m<rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>>|<cell|<space|-1mm>=>|<cell|<space|-1mm>\<alpha\><rsub|m>*<space|0.17em>m<rsub|i,j><rsup|<around|(|t-1|)>><around|(|x<rsub|j>|)>+<around|(|1-\<alpha\><rsub|m>|)>*<space|0.17em><wide|m|~><rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>,<eq-number><label|eqn>>>>>>

  where <math|\<alpha\><rsub|m>\<in\><around|[|0,1|)>> is referred as the
  <with|font-shape|italic|message damping factor>.

  <with|font-shape|italic|Belief Damping:> Instead of damping the messages in
  each iteration, the beliefs of the variables can be computed in each
  iteration as a weighted average, as

  <eqnarray|<tformat|<table|<row|<cell|<wide|<text|b>|~><rsub|i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>>|<cell|\<propto\>>|<cell|\<phi\><rsub|i><around|(|x<rsub|i>|)>*<big|prod><rsub|j\<in\>\<cal-N\><around|(|i|)>>m<rsub|j,i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>,<eq-number><label|eqn>>>>>>

  <eqnarray|<tformat|<table|<row|<cell|<text|b><rsub|i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>>|<cell|=>|<cell|\<alpha\><rsub|b><space|0.17em><text|b><rsub|i><rsup|<around|(|t-1|)>><around|(|x<rsub|i>|)>+<around|(|1-\<alpha\><rsub|b>|)><space|0.17em><wide|<text|b>|~><rsub|i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>,<eq-number><label|eqn>>>>>>

  where <math|\<alpha\><rsub|b>\<in\><around|[|0,1|)>> is referred to as the
  <with|font-shape|italic|belief damping factor>.

  <with|font-shape|italic|Hybrid Damping:> As a more general damping
  strategy, we can update both the messages as well as the beliefs according
  to (<reference|eqn>) and (<reference|eqn>), respectively, in each
  iteration. Different combinations of <math|<around|(|\<alpha\><rsub|m>,\<alpha\><rsub|b>|)>>
  values specializes to different strategies; for e.g.,
  <with|font-size|0.84|mode|math|<around|(|\<alpha\><rsub|m>=\<alpha\><rsub|b>=0|)>>
  corresponds to Undamped BP, <with|font-size|0.84|mode|math|<around|(|\<alpha\><rsub|m>\<neq\>0,\<alpha\><rsub|b>=0|)>>
  corresponds to Message damped BP, <with|font-size|0.84|mode|math|<around|(|\<alpha\><rsub|m>=0,\<alpha\><rsub|b>\<neq\>0|)>>
  corresponds to Belief damped BP, and <with|font-size|0.84|mode|math|<around|(|\<alpha\><rsub|m>\<neq\>0,\<alpha\><rsub|b>\<neq\>0|)>>
  corresponds to Hybrid damped BP.

  The proposed BP algorithm employing damping is listed in Table
  <reference|table2>.

  <subsection|Computation Complexity><label|sec>

  The per-symbol complexity of calculating messages and beliefs in a single
  BP iteration is <math|O*<around|(|K<rsup|2>*n<rsub|t><rsup|2>|)>> and
  <math|O*<around|(|K*n<rsub|t>|)>>, respectively. Likewise, the per-symbol
  complexity of computing <math|\<phi\>> and <math|\<psi\>> is
  <math|O<around|(|1|)>> and <math|O*<around|(|K*n<rsub|t>|)>>, respectively.
  The computation of <math|<with|font-series|bold|z>> can be carried out with
  <math|O*<around|(|K*n<rsub|r>|)>> per-symbol complexity. The computation of
  <math|<with|font-series|bold|R>> involves computation of
  <with|font-size|0.84|mode|math|<with|font-series|bold|H><rsup|H>*<with|font-series|bold|H>>,
  which involves three operations: <math|i)> computation of
  <math|<math-bf|G>>, <math|i*i)> calculation of
  <with|font-size|0.84|mode|math|<math-bf|G><rsup|H><math-bf|G>>, and
  <math|i*i*i)> multiplication of <math|<math-bf|F><rsup|H>> and
  <math|<math-bf|F>> with <math|<math-bf|G><rsup|H><math-bf|G>>. The
  computation <math|i)> involves <math|K>-point FFT of matrices
  <math|H<rsub|l>>, <math|l=0,\<cdots\>,L-1>, each <math|H<rsub|l>> of
  dimension <math|n<rsub|r>\<times\>n<rsub|t>>. The complexity associated
  with this operation is <math|O*<around|(|n<rsub|t>*n<rsub|r>*K*log<rsub|2>
  K|)>>. The total number of symbols transmitted is <math|K*n<rsub|t>>. So,
  the per-symbol complexity is <math|O*<around|(|n<rsub|r>*log<rsub|2> K|)>>.
  The computation <math|i*i)> involves the calculation of
  <math|<math-bf|G><rsub|i><rsup|H><math-bf|G><rsub|i>> for
  <math|i=0,\<cdots\>,K-1>. The computation of each
  <math|<math-bf|G><rsub|i><rsup|H><math-bf|G><rsub|i>> has complexity
  <math|O<around|(|n<rsub|t><rsup|3>|)>>. Due to block-diagonal structure of
  <math|<math-bf|G>>, <math|K> such computations can be done in
  <math|O*<around|(|K*n<rsub|t><rsup|3>|)>> complexity, leading to a
  per-symbol complexity of <math|O<around|(|n<rsub|t><rsup|2>|)>>. Likewise,
  due to the block-symmetric structure of <math|<math-bf|F>>, the per-symbol
  complexity corresponding to computation <math|i*i*i)> is
  <math|O*<around|(|K*n<rsub|t><rsup|2>|)>>. Since the number of BP
  iterations is much less than <math|K*n<rsub|t>>, the overall per-symbol
  complexity is of the proposed MRF based BP detection algorithm is given by
  <math|O*<around|(|K<rsup|2>*n<rsub|t><rsup|2>|)>>, which scales well for
  large <math|K*n<rsub|t>>.

  <\big-table>
    <tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|32|32|1|-1|cell-bborder|1ln>|<table|<row|<cell|>>|<row|<cell|<space|3mm><with|font-shape|italic|Initialization>>>|<row|<cell|1.
    <math|m<rsub|i,j><rsup|<around|(|0|)>><around|(|x<rsub|j>|)>=<text|b><rsub|i><rsup|<around|(|0|)>><around|(|x<rsub|i>|)>=0.5>,>>|<row|<cell|<math|<space|0.17em><space|0.17em>>
    <math|p*<around|(|x<rsub|i>=1|)>=p*<around|(|x<rsub|i>=-1|)>=0.5>,
    <math|<space|0.17em>> <math|\<forall\>i,j=1,\<cdots\>,K*n<rsub|t>>>>|<row|<cell|2.
    <math|<wide|m|~><rsub|i,j><rsup|<around|(|0|)>><around|(|x<rsub|j>|)>=<wide|<text|b>|~><rsub|i><rsup|<around|(|0|)>><around|(|x<rsub|i>|)>=0.5>,
    <math|<space|0.17em>> <math|\<forall\>i,j=1,\<cdots\>,K*n<rsub|t>>>>|<row|<cell|3.
    <math|<with|font-series|bold|z>=<frac|1|\<sigma\><rsup|2>>*<space|0.17em><with|font-series|bold|H><rsup|H>*<with|font-series|bold|r>>;
    <space|0.17em><space|0.17em><space|0.17em>
    <math|<math-bf|R>=<frac|1|\<sigma\><rsup|2>>*<space|0.17em><with|font-series|bold|H><rsup|H>*<with|font-series|bold|H>>>>|<row|<cell|4.
    <math|<space|0.17em><space|0.17em>> for <math|i> = 1 to
    <math|K*n<rsub|t>>>>|<row|<cell|5. <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>\<phi\><rsub|i><around|(|x<rsub|i>|)>=exp
    <around*|(|x<rsub|i>*\<Re\><around*|{|z<rsub|i>|}>+ln
    <around|(|p<around|(|x<rsub|i>|)>|)>|)>>>>|<row|<cell|6.
    <math|<space|0.17em><space|0.17em>> end for>>|<row|<cell|7.
    <math|<space|0.17em>> for <math|i> = 1 to
    <math|K*n<rsub|t>>>>|<row|<cell|8. <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>>for
    <math|j> = 1 to <math|K*n<rsub|t>>, <math|<space|0.17em><space|0.17em><space|0.17em>j\<neq\>i>>>|<row|<cell|9.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>\<psi\><rsub|i,j><around*|(|x<rsub|i>,x<rsub|j>|)>=exp
    <around*|(|-x<rsub|i>*\<Re\><around*|{|R<rsub|i,j>|}>*x<rsub|j>|)>>>>|<row|<cell|10.
    <math|<space|0.17em><space|0.17em><space|0.17em>>end for>>|<row|<cell|11.
    end for>>|<row|<cell|<space|3mm><with|font-shape|italic|Iterative Update
    of Messages and Beliefs>>>|<row|<cell|12. for <math|t> = 1 to
    <math|n*u*m_i*t*e*r>>>|<row|<cell|<space|5mm><with|font-shape|italic|Damped
    Message Calculation>>>|<row|<cell|13.
    <math|<space|0.17em><space|0.17em><space|0.17em>>for <math|i> = 1 to
    <math|K*n<rsub|t>>>>|<row|<cell|14. <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>>for
    <math|j> = 1 to <math|K*n<rsub|t>,<space|0.17em><space|0.17em><space|0.17em>j\<neq\>i>>>|<row|<cell|15.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><wide|m|~><rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>\<propto\><big|sum><rsub|x<rsub|i>>\<phi\><rsub|i><around|(|x<rsub|i>|)>*\<psi\><rsub|i,j><around|(|x<rsub|i>,x<rsub|j>|)>>>>|<row|<cell|<space|4cm><math|\<cdot\><big|prod><rsub|k\<in\>\<cal-N\><around|(|i|)>\<setminus\>j>m<rsub|k,i><rsup|<around|(|t-1|)>><around|(|x<rsub|i>|)>>>>|<row|<cell|16.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>m<rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>=\<alpha\><rsub|m>*<space|0.17em>m<rsub|i,j><rsup|<around|(|t-1|)>><around|(|x<rsub|j>|)>+<around|(|1-\<alpha\><rsub|m>|)>*<space|0.17em><wide|m|~><rsub|i,j><rsup|<around|(|t|)>><around|(|x<rsub|j>|)>>>>|<row|<cell|17.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>>end
    for>>|<row|<cell|18. <math|<space|0.17em><space|0.17em><space|0.17em>>end
    for>>|<row|<cell|<space|5mm><with|font-shape|italic|Damped Belief
    Calculation>>>|<row|<cell|19. <math|<space|0.17em><space|0.17em><space|0.17em>>for
    <math|i> = 1 to <math|K*n<rsub|t>>>>|<row|<cell|20.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>>
    <math|<wide|<text|b>|~><rsub|i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>\<propto\>\<phi\><rsub|i><around|(|x<rsub|i>|)>*<big|prod><rsub|j\<in\>\<cal-N\><around|(|i|)>>m<rsub|j,i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>>>>|<row|<cell|21.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>>
    <text|b><math|<rsub|i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>\<propto\>\<alpha\><rsub|b><space|0.17em><text|b><rsub|i><rsup|<around|(|t-1|)>><around|(|x<rsub|i>|)>+<around|(|1-\<alpha\><rsub|b>|)><space|0.17em><wide|<text|b>|~><rsub|i><rsup|<around|(|t|)>><around|(|x<rsub|i>|)>>>>|<row|<cell|22.
    <math|<space|0.17em><space|0.17em><space|0.17em><space|0.17em>>end
    for>>|<row|<cell|23. end for; <space|5mm>End of for loop starting at line
    12>>|<row|<cell|24. <math|<wide|x|^><rsub|i>=<ontop|<text|arg
    max>|x<rsub|i>\<in\>{\<pm\>1}><space|0.17em><space|0.17em><text|b><rsub|i><rsup|<around|(|n*u*m_i*t*e*r|)>><around*|(|x<rsub|i>|)>,<space|0.17em><space|0.17em><space|0.17em>\<forall\><space|0.17em>i=1,\<cdots\>,K*n<rsub|t>>>>|<row|<cell|25.
    Terminate>>>>><label|table2>

    <vspace|-8mm>
  </big-table|Proposed MRF Based BP Detector/Equalizer Algorithm.>

  <subsection|Simulation Results ><label|sec>

  In this section, we present the simulated BER performance of the proposed
  MRF BP detection algorithm.

  <with|font-shape|italic|Performance in Flat-Fading with Large
  <math|n<rsub|t>>:> In Figs. <reference|fig4> to <reference|fig6>, we
  illustrate the `large-dimension behavior' of the algorithm and the effect
  of damping for large number (tens) of transmit and receive antennas with
  BPSK modulation on flat fading channels (i.e., <math|L=K=1>). The number of
  BP iterations is 5. Figure <reference|fig4> shows the variation of the
  achieved BER as a function of the message damping factor,
  <math|\<alpha\><rsub|m>>, in <math|16\<times\>16> and <math|24\<times\>24>
  V-BLAST MIMO systems at an average received SNR per receive antenna,
  <math|\<gamma\>>, of 8 dB. Note that <math|\<alpha\><rsub|m>=0> corresponds
  to the case of undamped BP. It can be observed from Fig. <reference|fig4>
  that, depending on the choice of the value of <math|\<alpha\><rsub|m>>,
  message damping can significantly improve the BER performance of the BP
  algorithm. There is an optimum value of <math|\<alpha\><rsub|m>> at which
  the BER improvement over no damping case is maximum. For the chosen set of
  system parameters in Fig. <reference|fig4>, the optimum value of
  <math|\<alpha\><rsub|m>> is observed to be about 0.2. For this optimum
  value of <math|\<alpha\><rsub|m>=0.2>, it is observed that about an order
  of BER improvement is achieved with message damping compared to that
  without damping. From Fig. <reference|fig4>, it can further be seen that
  the performance improves for increasing <math|n<rsub|t>=n<rsub|r>> (i.e.,
  performance of the <math|n<rsub|t>=n<rsub|r>=24> system is better that of
  the <math|n<rsub|t>=n<rsub|r>=16> system). This shows that the algorithm
  exhibits `large-dimension behavior,' where the BER performance moves closer
  towards unfaded SISO AWGN performance when <math|n<rsub|t>=n<rsub|r>> is
  increased from 16 to 24. This large-dimension behavior is illustrated even
  more clearly in Fig. <reference|fig5>, where we plot the BER performance of
  V-BLAST MIMO as a function of SNR for different
  <math|n<rsub|t>=n<rsub|r>=4,8,16,24> and <math|32> for
  <math|\<alpha\><rsub|m>=0.2>.

  <big-figure|<space|-2mm><image|fig4.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig4>|BER performance of the MRF BP algorithm as a
  function of message damping factor, <math|\<alpha\><rsub|m>>, in V-BLAST
  MIMO with <math|n<rsub|t>=n<rsub|r>=16,24> on flat fading (<math|L=K=1>) at
  8 dB SNR. # BP iterations=5.>

  <big-figure|<space|-2mm><image|fig5.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig5>|BER performance of the MRF BP algorithm as a
  function of SNR in V-BLAST MIMO for different <math|n<rsub|t>=n<rsub|r>> on
  flat fading (<math|L=K=1>) with message damping,
  <math|\<alpha\><rsub|m>=0.2>, and # BP iterations = 5.>

  <big-figure|<space|-2mm><image|fig6.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig6>|Effect of message, belief, and hybrid damping on
  the BER performance of <math|8\<times\>8> STBC from CDA with
  <math|t=e<rsup|<with|font-series|bold|j>>>,
  <math|\<delta\>=e<rsup|<sqrt|5>*<space|0.17em><with|font-series|bold|j>>>,
  <math|n<rsub|t>=n<rsub|r>=8> on flat fading (<math|L=K=1>) at 8 dB SNR. MRF
  BP, # BP iterations = 5, <math|\<alpha\><rsub|m>=\<alpha\><rsub|b>> for
  hybrid damping.>

  In Fig. <reference|fig6>, we present a comparison of the BER performance
  achieved using message damping, belief damping and hybrid damping based BP
  detection of <math|8\<times\>8> non-orthogonal space-time block code (STBC)
  from cyclic division algebra (CDA) with
  <math|t=e<rsup|<with|font-series|bold|j>>>,
  <math|\<delta\>=e<rsup|<sqrt|5>*<space|0.17em><with|font-series|bold|j>>>
  <cite|bsr> at 8 dB SNR. In this type of STBC, each STBC is a
  <math|n<rsub|t>\<times\>p> square matrix with <math|n<rsub|t>> transmit
  antennas and <math|p=n<rsub|t>> time slots constructed using
  <math|n<rsub|t><rsup|2>> symbols, which results in <math|n<rsub|t><rsup|2>>
  dimensions and <math|n<rsub|t>> symbols per channel use. For message
  damping and belief damping, <math|\<alpha\><rsub|m>> and
  <math|\<alpha\><rsub|b>> are varied in the range 0 to 1. For hybrid
  damping, we set <math|\<alpha\><rsub|m>=\<alpha\><rsub|b>> and varied it in
  the range 0 to 1. From Fig. <reference|fig6>, it can be seen that <math|i>)
  with damping, there is an optimum value of the damping factor at which the
  BER performance is the best (e.g., for message damping, the optimum damping
  factor is about 0.3 in Fig. <reference|fig6>), <math|i*i>) message damping
  performs better than belief damping for small values of the damping factor,
  whereas belief damping performs better at high values of the damping
  factor; however, over the entire range of the damping factor, the best
  performance of message damping is significantly better than the best
  performance of belief damping, and <math|i*i*i>) for the chosen condition
  of <math|\<alpha\><rsub|m>=\<alpha\><rsub|b>>, hybrid damping performance
  is similar to that of message damping; however, <math|\<alpha\><rsub|m>>
  and <math|\<alpha\><rsub|b>> in hybrid damping can be jointly optimized to
  further improve the performance.

  <big-figure|<space|-3mm><image|fig7.eps|3.75in|2.90in||>
  <vspace|-5mm><label|fig7>|BER performance of the MRF BP algorithm as a
  function of the message damping factor, <math|\<alpha\><rsub|m>>, in
  MIMO-ISI channels. <math|n<rsub|t>=n<rsub|r>=4>,
  <math|<around|[|L=10,K=50|]>>, uniform power delay profile, average
  received SNR = 6 dB, # BP iterations = 7.>

  <with|font-shape|italic|Performance in MIMO-ISI Channels with Large
  <math|K*n<rsub|t>>:> In Fig. <reference|fig7>, we explore the effect of
  message damping on the BER performance of the MRF based BP
  detector/equalizer in MIMO-ISI channels. In all the simulations of MIMO-ISI
  channels, we have taken uniform power delay profile (i.e., all the <math|L>
  paths are assumed to have equal energy). Figure <reference|fig7> shows the
  variation of the achieved BER as a function of the message damping factor,
  <math|\<alpha\><rsub|m>>, for <math|n<rsub|t>=n<rsub|r>=4>, BPSK,
  <math|<around|[|L=10,K=50|]>>, at an average received SNR of 6 dB. The
  total number of dimensions, <math|K*n<rsub|t>=200>. The number of BP
  iterations used is 7. From Fig. <reference|fig7>, it is can be seen that
  damping can significantly improve the BER performance of the BP algorithm.
  For the chosen set of system parameters in Fig. <reference|fig7>, the
  optimum value of <math|\<alpha\><rsub|m>> is observed to be about 0.45,
  which gives about an order of BER improvement. This point of the benefit of
  damping in terms of BER performance (and also in terms of convergence) is
  even more clearly brought out in Fig. <reference|fig8>, where we have
  compared the BER performance without damping (<math|\<alpha\><rsub|m>=0>)
  and with damping (<math|\<alpha\><rsub|m>=0.45>) for
  <math|<around|[|L=20,K=100|]>> at an SNR of 7 dB as a function of the
  number of BP iterations. It is interesting to see that without damping
  (i.e., with <math|\<alpha\><rsub|m>=0>), the algorithm indeed shows
  `divergence' behavior, i.e., BER increases as number of iterations is
  increased beyond 4. Such divergence behavior is effectively removed by
  damping, as can be seen from the BER performance achieved with
  <math|\<alpha\><rsub|m>=0.45>. Indeed the algorithm with damping
  (<math|\<alpha\><rsub|m>=0.45>) is seen to be converge smoothly. It is also
  interesting to note that the algorithm converges to a BER which is quite
  close to the unfaded SISO AWGN BER (BER on SISO AWGN at 7 dB SNR is about
  <math|7.8\<times\>10<rsup|-4>> and the converged BER using damped BP is
  about <math|1\<times\>10<rsup|-3>>). This illustrates the potential of
  damping in improving BER performance and convergence of the algorithm when
  employed for detection/equalization in the considered MIMO system on
  severely delay spread frequency-selective channels (e.g., <math|L=20>). It
  is also noted that damping (as per Eqn. (<reference|eqn>)) does not
  increase the order of complexity of the algorithm without damping; the
  order of complexity without and with damping remains the same.

  <with|font-shape|italic|Comparison with MIMO-OFDM Performance:> In Fig.
  <reference|fig9>, we present a performance comparison between the
  considered MIMO-CPSC scheme and a MIMO-OFDM scheme for the same
  system/channel parameters in both cases; for <math|n<rsub|t>=n<rsub|r>=4>
  and following combinations of <math|L> and <math|K>:
  <math|<around|[|L=5,K=25|]>>, <math|<around|[|L=10,K=50|]>>,
  <math|<around|[|L=20,K=100|]>>. For MIMO-CPSC, two detection schemes are
  considered: FD-MMSE and proposed MRF BP. For the MRF BP, number of BP
  iterations used is 10 and the value of <math|\<alpha\><rsub|m>> used is
  0.45. For MIMO-OFDM, two detection schemes, namely, MMSE and ML detection
  on each subcarrier are considered. We have also plotted the unfaded SISO
  AWGN performance that serves as a lower bound on the optimum detection
  performance. The following observations can be made from Fig.
  <reference|fig9>: <math|i)> MIMO-OFDM with MMSE detection performs the
  worst among all the considered system/detection configurations, <math|i*i)>
  MIMO-CPSC with FD-MMSE performs better than MIMO-OFDM with MMSE (this
  better performance in CPSC is in line with other reported comparisons
  between OFDM and CPSC, e.g., <cite|cpsc1>,<cite|cpsc2>,<cite|cpsc3>),
  <math|i*i*i>) at the expense of increased detection complexity, MIMO-OFDM
  with ML detection performs better than both MIMO-OFDM with MMSE and
  MIMO-CPSC with FD-MMSE, and <math|i*v>) more interestingly, MIMO-CPSC with
  the low-complexity MRF BP detection significantly outperforms MIMO-OFDM
  even with ML detection. Indeed, the performance of the MIMO-CPSC with MRF
  BP detection gets increasingly closer to the SISO AGWN performance for
  increasing <math|L>, <math|K>, keeping <math|L/K> constant. For example,
  the gap between the MRF BP performance and the SISO AWGN performance is
  only about 0.25 dB for <math|L=20> at a BER of <math|10<rsup|-3>>. This
  illustrates the ability of the MRF BP algorithm to achieve near-optimal
  performance for severely delay spread MIMO-ISI channels (i.e., large
  <math|L>) as witnessed in UWB systems.

  <big-figure|<space|-3mm><image|fig8.eps|3.75in|2.90in||>
  <vspace|-5mm><label|fig8><vspace|-3mm>|Comparison of the BER performance of
  message damped and undamped MRF BP detector/equalizer as a function of
  number of BP iterations in MIMO-ISI channels. <math|n<rsub|t>=n<rsub|r>=4>,
  <math|<around|[|L=20,K=100|]>>, uniform power delay profile, average
  received SNR = 7 dB, <math|\<alpha\><rsub|m>=0> (undamped),
  <math|\<alpha\><rsub|m>=0.45> (damped). >

  <big-figure|<space|-3mm><image|fig9.eps|3.75in|2.90in||>
  <vspace|-7mm><label|fig9><vspace|-6mm>|BER performance of message damped
  MRF BP detector/equalizer as a function of average received SNR in MIMO-ISI
  channels with <math|n<rsub|t>=n<rsub|r>=4> for different values of <math|L>
  and <math|K> keeping <math|L/K> constant: <math|<around|[|L=5,K=25|]>>,
  <math|<around|[|L=10,K=50|]>>, and <math|<around|[|L=20,K=100|]>>. Uniform
  power delay profile. # BP iterations = 10, <math|\<alpha\><rsub|m>=0.45>.>

  <section|Detection using BP on Factor Graphs with Gaussian Approximation of
  Interference ><label|sec4>

  In this section, we present another low-complexity algorithm based on BP
  for detection in large-dimension MIMO-ISI channels. The graphical model
  employed here is factor graphs. A key idea in the proposed factor graph
  approach which enables to achieve low-complexity is the Gaussian
  approximation of interference (GAI) in the system.

  <\with|par-columns|1>
    <\big-figure>
      <subfigure*||<image|fig10a.eps|2.5in|2.00in||>>
      <space|10mm><subfigure*||<image|fig10b.eps|2.5in|2.00in||>>
    </big-figure|<label|fig10> Message passing between variable nodes and
    observation nodes.>
  </with>

  Consider the MIMO system model in (<reference|eqn1>). We will treat each
  entry of the observation vector <math|<with|font-series|bold|r>> as a
  function node (observation node) in a factor graph, and each transmitted
  symbol as a variable node. The received signal <math|r<rsub|i>> can be
  written as

  <eqnarray|<tformat|<table|<row|<cell|r<rsub|i>>|<cell|=>|<cell|<big|sum><rsub|j=1><rsup|K*n<rsub|t>>h<rsub|i*j>*x<rsub|j>+v<rsub|i>>>|<row|<cell|>|<cell|=>|<cell|h<rsub|i*k>*x<rsub|k>+<wide*|<big|sum><rsub|j=1,j\<ne\>k><rsup|K*n<rsub|t>>h<rsub|i*j>*x<rsub|j>|\<wide-underbrace\>><rsub|I*n*t*e*r*f*e*r*e*n*c*e>+<space|0.17em><space|0.17em>v<rsub|i>.<eq-number><label|model1>>>>>>

  When computing the message from the <math|i>th observation node to the
  <math|k>th variable node, we make the following Gaussian approximation of
  the interference:

  <eqnarray|<tformat|<table|<row|<cell|r<rsub|i>>|<cell|=>|<cell|h<rsub|i*k>*x<rsub|k>+<wide*|<big|sum><rsub|j=1,j\<ne\>k><rsup|K*n<rsub|t>>h<rsub|i*j>*x<rsub|j>+v<rsub|i>|\<wide-underbrace\>><rsub|<Define><space|0.17em><space|0.17em>z<rsub|i*k><space|0.17em>>,<eq-number><label|model2>>>>>>

  where the interference plus noise term, <math|z<rsub|i*k>>, is modeled as
  <math|\<bbb-C\>*\<cal-N\><around|(|\<mu\><rsub|z<rsub|i*k>>,\<sigma\><rsup|2><rsub|z<rsub|i*k>>|)>>
  with

  <eqnarray|<tformat|<table|<row|<cell|\<mu\><rsub|z<rsub|i*k>>>|<cell|=>|<cell|<big|sum><rsub|j=1,j\<ne\>k><rsup|K*n<rsub|t>>h<rsub|i*j>*\<bbb-E\><around|(|x<rsub|j>|)>,<eq-number><label|mu>>>>>>

  <eqnarray|<tformat|<table|<row|<cell|\<sigma\><rsup|2><rsub|z<rsub|i*k>>>|<cell|=>|<cell|<big|sum><rsub|j=1,j\<ne\>k><rsup|K*n<rsub|t>><around|\||h<rsub|i*j>|\|><rsup|2><space|0.17em><text|Var><around|(|x<rsub|j>|)>+\<sigma\><rsup|2>.<eq-number><label|sigma2>>>>>>

  For BPSK signaling, the log-likelihood ratio (LLR) of the symbol
  <math|x<rsub|k>\<in\>{+1,-1}> at observation node <math|i>, denoted by
  <math|\<Lambda\><rsub|i><rsup|k>>, can be written as

  <eqnarray|<tformat|<table|<row|<cell|\<Lambda\><rsub|i><rsup|k>>|<cell|=>|<cell|log
  <frac|p*<around|(|r<rsub|i>\|<with|font-series|bold|H>,x<rsub|k>=1|)>|p*<around|(|r<rsub|i>\|<with|font-series|bold|H>,x<rsub|k>=-1|)>>>>|<row|<cell|>|<cell|=>|<cell|<frac|4|\<sigma\><rsub|z<rsub|i*k>><rsup|2>>*\<Re\>*<around*|(|h<rsub|i*k><rsup|\<ast\>>*<around|(|r<rsub|i>-\<mu\><rsub|z<rsub|i*k>>|)>|)>.<eq-number><label|LLR>>>>>>

  The LLR values computed at the observation nodes are passed to the variable
  nodes (Fig. <reference|fig10>a). Using these LLRs, the variable nodes
  compute the probabilities

  <eqnarray|<tformat|<table|<row|<cell|<space|-6mm>p<rsub|i><rsup|k+>>|<cell|<Define>>|<cell|p<rsub|i>*<around|(|x<rsub|k>=+1\|<with|font-series|bold|r>|)>>>|<row|<cell|>|<cell|=>|<cell|<frac|<text|exp><around|(|<big|sum><rsub|l=1,l\<neq\>i><rsup|K*n<rsub|r>>\<Lambda\><rsub|l><rsup|k>|)>|1+<text|exp><around|(|<big|sum><rsub|l=1,l\<neq\>i><rsup|K*n<rsub|r>>\<Lambda\><rsub|l><rsup|k>|)>>,<eq-number><label|prob>>>>>>

  and pass them back to the observation nodes (Fig. <reference|fig10>b). This
  message passing is carried out for a certain number of iterations. Messages
  can be damped as described in Section <reference|sec> and then passed.
  Finally, <math|x<rsub|k>> is detected as

  <eqnarray|<tformat|<table|<row|<cell|<wide|x|^><rsub|k>>|<cell|=>|<cell|<text|sgn><around*|(|<big|sum><rsub|i=1><rsup|K*n<rsub|r>>\<Lambda\><rsub|i><rsup|k>|)>.<eq-number>>>>>>

  Note that approximating the interference as Gaussian greatly simplifies the
  computation of messages (as can be seen from the complexity discussion in
  the following subsection.)

  <subsection|Computation Complexity><label|sec>

  The computation complexity of the FG-GAI BP algorithm in the above involves
  <math|i)> LLR calculations at the observation nodes as per
  (<reference|LLR>), which has <math|O*<around|(|K<rsup|2>*n<rsub|t>*n<rsub|r>|)>>
  complexity, and <math|i*i>) calculation of probabilities at variable nodes
  as per (<reference|prob>), which also requires
  <math|O*<around|(|K<rsup|2>*n<rsub|t>*n<rsub|r>|)>> complexity<footnote|A
  naive implementation of (<reference|LLR>) would require a summation over
  <math|K*n<rsub|t>-1> variable nodes for each message, amounting to a
  complexity of order <math|O*<around|(|K<rsup|3>*n<rsub|t><rsup|2>*n<rsub|r>|)>>.
  However, the summation over <math|K*n<rsub|t>-1> variables in
  (<reference|mu>) can be written in the form
  <math|<big|sum><rsub|j=1><rsup|K*n<rsub|t>>h<rsub|i*j>*\<bbb-E\><around|(|x<rsub|j>|)>-h<rsub|i*k>*\<bbb-E\><around|(|x<rsub|k>|)>>,
  where the computation of the full summation from <math|j=1> to
  <math|K*n<rsub|t>> (which is independent of the variable index <math|k>)
  requires <math|K*n<rsub|t>-1> additions. In addition, one subtraction
  operation for each <math|k> is required. The makes the complexity order for
  computing (<reference|mu>) to be only <math|O*<around|(|K<rsup|2>*n<rsub|t>*n<rsub|r>|)>>.
  A similar argument holds for computation of the variance in
  (<reference|sigma2>), and hence the complexity of computing the LLR in
  (<reference|LLR>) becomes <math|O*<around|(|K<rsup|2>*n<rsub|t>*n<rsub|r>|)>>.
  Likewise, a similar rewriting of the summation in (<reference|prob>) leads
  to a complexity of <math|O*<around|(|K<rsup|2>*n<rsub|t>*n<rsub|r>|)>>.>.
  Hence, the overall complexity of the algorithm is
  <math|O*<around|(|K<rsup|2>*n<rsub|t>*n<rsub|r>|)>> for detecting
  <math|K*n<rsub|t>> transmitted symbols. So the per-symbol complexity is
  just <math|O*<around|(|K*n<rsub|t>|)>> for <math|n<rsub|t>=n<rsub|r>>. Note
  that this complexity is one order less than that of the MRF based approach
  in the previous section. Because of its linear complexity in <math|K> and
  <math|n<rsub|t>>, the proposed FG approach with GAI is quite attractive for
  detection in large-dimension MIMO-ISI channels. In addition, the BER
  performance achieved by the algorithm in large dimensions is very good (as
  shown in the BER performance results in the following subsection).

  <subsection|Simulation Results><label|sec>

  Figure <reference|fig11> shows the simulated BER performance of the FG-GAI
  BP algorithm in <math|n<rsub|t>\<times\>n<rsub|r>> V-BLAST MIMO with
  <math|n<rsub|t>=n<rsub|r>=8,16,24,32,64> and BPSK on flat fading
  (<math|L=K=1>). The number of BP iterations and and message damping factor
  used are 10 and 0.4, respectively. We observe that, like the MRF approach,
  the FG-GAI approach also exhibits large-dimension behavior; e.g.,
  <math|32\<times\>32> and <math|64\<times\>64> V-BLAST systems perform close
  to unfaded SISO AWGN performance. Similar large-dimension behavior is shown
  in Fig. <reference|fig12> in MIMO-ISI channels with <math|L=6> and
  <math|K=64> for <math|n<rsub|t>=n<rsub|r>=4,8,16>; i.e., BERs move
  increasingly closer to unfaded SISO AWGN BER for increasing
  <math|K*n<rsub|t>=256,512,1024>. Figure <reference|fig13> presents a
  comparison of the performances achieved by the MRF and FG-GAI approaches
  for the following system settings: <math|n<rsub|t>=n<rsub|r>=4>,
  <math|<around|[|L=5,K=25|]>>, <math|<around|[|L=20,K=100|]>>, and BPSK. It
  can be seen that, for these system settings, the FG with GAI approach
  performs almost the same as the MRF approach, at one order lesser
  complexity than that of the MRF approach.

  Figure <reference|fig13a> presents a comparison of the performances
  achieved by the proposed scheme and the scheme in <cite|wo> for
  <math|n<rsub|t>=n<rsub|r>=4>, <math|<around|[|L=4,K=400|]>>, and BPSK. It
  can be seen that while the scheme in <cite|wo> exhibits an error floor, the
  proposed scheme avoids flooring and achieves much better performance. Such
  good performance is achieved because equalization is done jointly on all
  the <math|K*n<rsub|t>> symbols in a frame. The complexity of the scheme in
  <cite|wo> is <math|O*<around|(|L*n<rsub|t>|)>>, whereas the complexity of
  the proposed scheme is <math|O*<around|(|K*n<rsub|t>|)>>. Though
  <math|K\<gtr\>L>, the linear complexity of the proposed scheme in <math|K>
  is still very attractive. Also, as with MRF BP, the FG-GAI BP algorithm in
  MIMO-CPSC performs significantly better than MIMO-OFDM even with ML
  detection.

  <big-figure|<space|-3mm><image|fig11.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig11><vspace|-4mm>|BER performance of the FG-GAI BP
  algorithm in V-BLAST MIMO systems with <math|n<rsub|t>=n<rsub|r>=8,16,24,32,64>
  on flat fading (<math|L=K=1>). # BP iterations = 20,
  <math|\<alpha\><rsub|m>=0.4>.>

  <big-figure|<space|-3mm><image|fig12.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig12>|BER performance of the FG-GAI BP algorithm in
  MIMO-ISI channels with for <math|<around|[|L=6,K=64|]>> for
  <math|n<rsub|t>=n<rsub|r>=4,8,16>. Uniform power delay profile, # BP
  iterations = 10, <math|\<alpha\><rsub|m>=0.4>.>

  <big-figure|<space|-3mm><image|fig13.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig13>|Comparison of the BER performances of the MRF BP
  and FG-GAI BP algorithms in MIMO-ISI channels with
  <math|n<rsub|t>=n<rsub|r>=4>, <math|<around|[|L=5,K=25|]>>,
  <math|<around|[|L=20,K=100|]>>, uniform power delay profile. >

  <big-figure|<space|-3mm><image|fig13a.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig13a>|Comparison of the BER performances of the
  FG-GAI BP scheme and the scheme in <cite|wo> in MIMO-ISI channels with
  <math|n<rsub|t>=n<rsub|r>=4>, <math|<around|[|L=4,K=400|]>>, uniform power
  delay profile. >

  <section|Hybrid Algorithms Using BP and Local Neighborhood Search for
  <math|M>-QAM><label|sec5>

  The BP algorithms proposed in the previous two sections are for BPSK
  modulation, i.e., for <math|<with|font-series|bold|x>\<in\>{\<pm\>1}<rsup|K*n<rsub|t>>>.
  They can work for 4-QAM also by viewing the transmit symbol vector to be in
  <math|{\<pm\>1}<rsup|2*K*n<rsub|t>>>. Low-complexity algorithms for
  detection/equalization for higher-order <math|M>-QAM, <math|M\<gtr\>4>,
  over large dimension MIMO-ISI channels are of interest. A BP based
  algorithm that is suited for higher-order QAM in MIMO has been reported
  recently in <cite|gta>. The algorithm in <cite|gta> uses a Gaussian tree
  approximation (GTA) to convert the fully-connected graph representing the
  MIMO system into a tree, and carries out BP on the resultant approximate
  tree. We refer to this algorithm in <cite|gta> as the GTA BP algorithm. In
  this section, we take an alternate hybrid approach for efficient detection
  of <math|M>-QAM signals, where the proposed FG-GAI BP algorithm for BPSK is
  used to improve the <math|M>-QAM detection performance of local
  neighborhood search algorithms. Simulation results (Fig. <reference|fig16>)
  show that the proposed hybrid approach performs better than the GTA BP
  approach in <cite|gta>.

  <with|font-shape|italic|Local Neighborhood Search Based Detection:> Low
  complexity search algorithms that attempt to minimize the
  maximum-likelihood (ML) cost <math|<around|\<\|\|\>|<with|font-series|bold|r>-<with|font-series|bold|H*x>|\<\|\|\>><rsup|2>>,
  by limiting the search space to local neighborhood have been proposed for
  detection of <math|M>-QAM signals in MIMO \U e.g., tabu search (TS)
  algorithm <cite|tabu1>-<cite|isi_gcom09>. Such local neighborhood search
  algorithms have the advantage of low-complexity (e.g., TS algorithms, like
  the proposed MRF BP algorithm, has quadratic complexity in
  <math|K*n<rsub|t>>), making them suited for large dimensions. However,
  their higher-order QAM performance is away from optimal performance. Here,
  we propose to improve the <math|M>-QAM performance of these search
  algorithms through the application of the proposed BP algorithms on the
  search algorithm outputs. This approach essentially improves the
  reliability of the output symbols from the local neighborhood search,
  thereby improving the overall BER performance. We apply this hybrid
  approach to the reactive tabu search (RTS) algorithm in <cite|isi_gcom09>.

  <with|font-shape|italic|Hybrid RTS-BP Approach:> In the following
  subsections, we first present a brief summary of the RTS algorithm in
  <cite|isi_gcom09> and the motivation behind the proposed hybrid approach.
  Next, we present the proposed hybrid RTS-BP algorithm and its BER
  performance. Finally, we present a method to reduce complexity based on the
  knowledge of the simulated pdf of the RTS algorithm output.

  <subsection|Reactive Tabu Search (RTS) Algorithm><label|sec>

  Here, we present a brief summary of the RTS algorithm in <cite|isi_gcom09>.
  The RTS algorithm starts with an initial solution vector, defines a
  neighborhood around it (i.e., defines a set of neighboring vectors based on
  a neighborhood criteria), and moves to the best vector among the
  neighboring vectors (even if the best neighboring vector is worse, in terms
  of ML cost <math|<around|\<\|\|\>|<with|font-series|bold|r>-<with|font-series|bold|H*x>|\<\|\|\>><rsup|2>>,
  than the current solution vector); this allows the algorithm to escape from
  local minima. This process is continued for a certain number of iterations,
  after which the algorithm is terminated and the best among the solution
  vectors in all the iterations is declared as the final solution vector. In
  defining the neighborhood of the solution vector in a given iteration, the
  algorithm attempts to avoid cycling by making the moves to solution vectors
  of the past few iterations as `tabu' (i.e., prohibits these moves), which
  ensures efficient search of the solution space. The number of these past
  iterations is parametrized as the `tabu period,' which is dynamically
  changed depending on the number of repetitions of the solution vectors that
  are observed in the search path (e.g., increase the tabu period if more
  repetitions are observed). The per-symbol complexity of the RTS algorithm
  is quadratic in <math|K*n<rsub|t>> for <math|n<rsub|t>=n<rsub|r>>.

  <subsection|Motivation for Hybrid RTS-BP Algorithm><label|sec>

  The proposed hybrid RTS-BP approach is motivated by the following two
  observations we made in our BER simulations of the RTS algorithm: <math|i)>
  the RTS algorithm performed very close to optimum performance in large
  dimensions for 4-QAM; however, its higher-order QAM performance is far from
  optimal, and <math|i*i)> at moderate to high SNRs, when an RTS output
  vector is in error, the least significant bits (LSB) of the data symbols
  are more likely to be in error than other bits. An analytical reasoning for
  the second observation can be given as follows.

  Let the transmitted symbols take values from <math|M>-QAM alphabet
  <math|\<bbb-A\>>, so that <math|<with|font-series|bold|x>\<in\>\<bbb-A\><rsup|n<rsub|t>>>
  is the transmitted vector. Consider the real-valued system model
  corresponding to (<reference|eqn1>), given by
  <math|<with|font-series|bold|r><rprime|'>=<with|font-series|bold|H><rprime|'>*<space|0.17em><with|font-series|bold|x><rprime|'>+<with|font-series|bold|v><rprime|'>>,
  where

  <eqnarray|<tformat|<table|<row|<cell|<label|SystemModelRealDef><space|14mm><with|font-series|bold|H><rprime|'>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|\<Re\><around|(|<with|font-series|bold|H>|)><space|2mm>-\<Im\><around|(|<with|font-series|bold|H>|)>>>|<row|<cell|\<Im\><around|(|<with|font-series|bold|H>|)>*<space|5mm>\<Re\><around|(|<with|font-series|bold|H>|)>>>>>>|]>,<space|1em><with|font-series|bold|r><rprime|'>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|\<Re\><around|(|<with|font-series|bold|r>|)>>>|<row|<cell|\<Im\><around|(|<with|font-series|bold|r>|)>>>>>>|]>,>>>>>

  <eqnarray|<tformat|<table|<row|<cell|<with|font-series|bold|x><rprime|'>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|\<Re\><around|(|<with|font-series|bold|x>|)>>>|<row|<cell|\<Im\><around|(|<with|font-series|bold|x>|)>>>>>>|]>,<space|1em><with|font-series|bold|v><rprime|'>=<around*|[|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|\<Re\><around|(|<with|font-series|bold|v>|)>>>|<row|<cell|\<Im\><around|(|<with|font-series|bold|v>|)>>>>>>|]>.<eq-number>>>>>>

  <math|<with|font-series|bold|x><rprime|'>> is a
  <math|2*K*n<rsub|t>\<times\>1> vector; <math|<around|[|x<rsub|1><rprime|'>,\<cdots\>,x<rsub|K*n<rsub|t>><rprime|'>|]>>
  can be viewed to be from an underlying <math|M>-PAM signal set, and so is
  <math|<around|[|x<rsub|K*n<rsub|t>+1><rprime|'>,\<cdots\>,x<rsub|2*K*n<rsub|t>><rprime|'>|]>>.
  Let <math|\<bbb-B\>=<around|{|a<rsub|1>,a<rsub|2>,\<cdots\>,a<rsub|M>|}>>
  denote the <math|M>-PAM alphabet that <math|x<rsub|i><rprime|'>> takes its
  value from.

  Let <math|<wide|<with|font-series|bold|x>|^><rprime|'>> denote the detected
  output vector from the RTS algorithm corresponding to the transmitted
  vector <math|<with|font-series|bold|x><rprime|'>>. Consider the expansion
  of the <math|M>-PAM symbols in terms of <math|\<pm\>1>'s, where we can
  write the value of each entry of <math|<wide|<with|font-series|bold|x>|^><rprime|'>>
  as a linear combination of <math|\<pm\>1>'s as

  <eqnarray|<tformat|<table|<row|<cell|<wide|x|^><rsub|i><rprime|'>>|<cell|=>|<cell|<big|sum><rsub|j=0><rsup|N-1>2<rsup|j>*<space|0.17em><wide|b|^><rsub|i><rsup|<around|(|j|)>>,<space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em><space|0.17em>i=1,\<cdots\>,2*K*n<rsub|t>,<eq-number><label|linearComb>>>>>>

  where <math|N=log<rsub|2> M> and <math|<wide|b|^><rsub|i><rsup|<around|(|j|)>>\<in\>{\<pm\>1}>.
  We note that the RTS algorithm outputs a local minima as the solution
  vector. So, <math|<wide|<with|font-series|bold|x>|^><rprime|'>>, being a
  local minima, satisfies the following conditions:

  <vspace|-2mm>

  <with|font-size|0.84|<eqnarray|<tformat|<table|<row|<cell|<around|\<\|\|\>|<with|font-series|bold|r><rprime|'>-<with|font-series|bold|H><rprime|'>*<wide|<with|font-series|bold|x>|^><rprime|'>|\<\|\|\>><rsup|2><space|0.17em>\<le\><space|0.17em><around|\<\|\|\>|<with|font-series|bold|r><rprime|'>-<with|font-series|bold|H><rprime|'>*<around|(|<wide|<with|font-series|bold|x>|^><rprime|'>+\<lambda\><rsub|i>*<with|font-series|bold|e><rsub|i>|)>|\<\|\|\>><rsup|2>,<space|0.17em><space|0.17em>\<forall\>i=1,\<cdots\>,2*K*n<rsub|t>,<eq-number><label|localminima>>>>>>>

  <vspace|-4mm><space|-5mm>where <math|\<lambda\><rsub|i>=<around|(|a<rsub|q>-<wide|x|^><rsub|i><rprime|'>|)>,<space|0.17em>q=1,\<cdots\>,M>,
  and <math|<with|font-series|bold|e><rsub|i>> denotes the <math|i>th column
  of the identity matrix. Defining <math|<with|font-series|bold|F><rprime|'><Define><with|font-series|bold|H><rprime|'><rsup|T>*<with|font-series|bold|H><rprime|'>>
  and denoting the <math|i>th column of <math|<with|font-series|bold|H><rprime|'>>
  as <math|<with|font-series|bold|h><rsub|i>>, the conditions in
  (<reference|localminima>) reduce to

  <eqnarray|<tformat|<table|<row|<cell|2*\<lambda\><rsub|i>*<with|font-series|bold|r><rprime|'><rsup|T>*<with|font-series|bold|h><rsub|i>>|<cell|\<le\>>|<cell|2*\<lambda\><rsub|i>*<around|(|<with|font-series|bold|H><rprime|'>*<wide|<with|font-series|bold|x>|^><rprime|'>|)><rsup|T>*<with|font-series|bold|h><rsub|i>+\<lambda\><rsub|i><rsup|2>*f<rsub|i*i>,<eq-number><label|reducedcondition>>>>>>

  where <math|f<rsub|i*j>> denotes the <math|<around|(|i,j|)>>th element of
  <math|<with|font-series|bold|F><rprime|'>>. Under moderate to high SNR
  conditions, ignoring the noise, (<reference|reducedcondition>) can be
  further reduced to

  <eqnarray|<tformat|<table|<row|<cell|2*<around|(|<with|font-series|bold|x<rprime|'>>-<wide|<with|font-series|bold|x>|^><rprime|'>|)><rsup|T>*<with|font-series|bold|f><rsub|i><space|0.17em><text|sgn><around|(|\<lambda\><rsub|i>|)>>|<cell|\<le\>>|<cell|\<lambda\><rsub|i>*f<rsub|i*i><space|0.17em><text|sgn><around|(|\<lambda\><rsub|i>|)>,<eq-number><label|finalcondn1>>>>>>

  where <math|<with|font-series|bold|f><rsub|i>> denotes the <math|i>th
  column of <math|<with|font-series|bold|F><rprime|'>>. For Rayleigh fading,
  <math|f<rsub|i*i>> is chi-square distributed with <math|2*K*N<rsub|t>>
  degrees of freedom with mean <math|K*N<rsub|t>>. Approximating the
  distribution of <math|f<rsub|i*j>> to be normal with mean zero and variance
  <math|<frac|K*N<rsub|t>|4>> for <math|i\<ne\>j> by central limit theorem,
  we can drop the sgn<math|<around|(|\<lambda\><rsub|i>|)>> in
  (<reference|finalcondn1>). Using the fact that the minimum value of
  <math|<around|\||\<lambda\><rsub|i>|\|>> is 2, (<reference|finalcondn1>)
  can be simplified as <vspace|-1mm>

  <eqnarray|<tformat|<table|<row|<cell|<big|sum><rsub|x<rsub|j><rprime|'>\<ne\><wide|x|^><rsub|j><rprime|'>>\<Delta\><rsub|j>*f<rsub|i*j>>|<cell|\<le\>>|<cell|f<rsub|i*i>,<eq-number><label|finalcondn2>>>>>>

  where <math|\<Delta\><rsub|j>=x<rsub|j><rprime|'>-<wide|x|^><rsub|j><rprime|'>>.
  Also, if <math|x<rsub|i><rprime|'>=<wide|x|^><rsub|i><rprime|'>>, by the
  normal approximation in the above

  <eqnarray|<tformat|<table|<row|<cell|<big|sum><rsub|x<rsub|j><rprime|'>\<ne\><wide|x|^><rsub|j><rprime|'>>\<Delta\><rsub|j>*f<rsub|i*j>>|<cell|\<sim\>>|<cell|\<cal-N\>*<around*|(|0,<frac|K*N<rsub|t>|4>*<big|sum><rsub|x<rsub|j><rprime|'>\<ne\><wide|x|^><rsub|j><rprime|'>>\<Delta\><rsub|j><rsup|2>|)>.<eq-number><label|finalline>>>>>>

  Now, the LHS in (<reference|finalcondn2>) being normal with variance
  proportional to <math|\<Delta\><rsub|j><rsup|2>> and the RHS being
  positive, it can be seen that <math|\<Delta\><rsub|i>>, <math|\<forall\>i>
  take smaller values with higher probability. Hence, the symbols of
  <math|<wide|<with|font-series|bold|x>|^><rprime|'>> are nearest Euclidean
  neighbors of their corresponding symbols of the transmitted vector with
  high probability<footnote|Because <math|x<rsub|i><rprime|'>>'s and
  <math|<wide|x|^><rsub|i><rprime|'>>'s take values from <math|M>-PAM
  alphabet, <math|<wide|x|^><rsub|i><rprime|'>> is said to be the Euclidean
  nearest neighbor of <math|x<rsub|i>> if
  <math|<around|\||x<rsub|i><rprime|'>-<wide|x|^><rsub|i><rprime|'>|\|>=2>.>.
  Now, because of the symbol-to-bit mapping in (<reference|linearComb>),
  <math|<wide|x|^><rsub|i><rprime|'>> will differ from its nearest Euclidean
  neighbors certainly in the LSB position, and may or may not differ in other
  bit positions. Consequently, the LSBs of the symbols in the RTS output
  <math|<wide|<with|font-series|bold|x>|^><rprime|'>> are least reliable.

  The above observation then led us to consider improving the reliability of
  the LSBs of the RTS output using the proposed FG-GAI BP algorithm presented
  in Section <reference|sec4>, and iterate between RTS and FG-GAI BP as
  follows.

  <big-figure|<with|par-mode|center|<image|fig14.eps|3.45in|1.2in||>
  <vspace|-4mm><label|fig14>>|Hybrid RTS-BP algorithm.>

  <subsection|Proposed Hybrid RTS-BP Algorithm><label|sec>

  Figure <reference|fig14> shows the block schematic of the proposed hybrid
  RTS-BP algorithm. The following four steps constitute the proposed
  algorithm.

  <\itemize>
    <item><with|font-shape|italic|Step 1:> Obtain
    <math|<wide|<with|font-series|bold|x>|^><rprime|'>> using the RTS
    algorithm. Obtain the output bits <math|<wide|b|^><rsub|i><rsup|<around|(|j|)>>>,
    <math|i=1,\<cdots\>,2*K*n<rsub|t>>, <math|j=0,\<cdots\>,N-1>, from
    <math|<wide|<with|font-series|bold|x>|^><rprime|'>> and
    (<reference|linearComb>).

    <item><with|font-shape|italic|Step 2:> Using the
    <math|<wide|b|^><rsub|i><rsup|<around|(|j|)>>>'s from Step 1, reconstruct
    the interference from all bits other than the LSBs <left|(>i.e.,
    interference from all bits other than
    <math|<wide|b|^><rsub|i><rsup|<around|(|0|)>>>'s<right|)> as

    <eqnarray|<tformat|<table|<row|<cell|<wide|<with|font-series|bold|I>|~>>|<cell|=>|<cell|<big|sum><rsub|j=1><rsup|N-1>2<rsup|j>*<space|0.17em><with|font-series|bold|H><rprime|'>*<space|0.17em><wide|<with|font-series|bold|b>|^><rsup|<around|(|j|)>>,<eq-number><label|intf>>>>>>

    where <math|<wide|<with|font-series|bold|b>|^><rsup|<around|(|j|)>>=<around*|[|<wide|b|^><rsub|1><rsup|<around|(|j|)>>,<wide|b|^><rsub|2><rsup|<around|(|j|)>>,\<ldots\>,<wide|b|^><rsub|2*K*n<rsub|t>><rsup|<around|(|j|)>>|]><rsup|T>>.
    Cancel the reconstructed interference in (<reference|intf>) from
    <with|font-series|bold|r> as

    <eqnarray|<tformat|<table|<row|<cell|<wide|<with|font-series|bold|r>|~><rprime|'>>|<cell|=>|<cell|<with|font-series|bold|r><rprime|'>-<wide|<with|font-series|bold|I>|~>.<eq-number>>>>>>

    <item><with|font-shape|italic|Step 3:> Run the FG-GAI BP algorithm in
    Section <reference|sec4> on the vector
    <math|<wide|<with|font-series|bold|r>|~><rprime|'>> in Step 2, and obtain
    an estimate of the LSBs. Denote this LSB output vector from FG-GAI BP as
    <math|<wide|<wide|<with|font-series|bold|b>|^>|^><rsup|<around|(|0|)>>>.
    Now, using <math|<wide|<wide|<with|font-series|bold|b>|^>|^><rsup|<around|(|0|)>>>
    from the BP output, and the <math|<wide|<with|font-series|bold|b>|^><rsup|<around|(|j|)>>>,
    <math|j=1,\<cdots\>,N-1> from the RTS output in Step 1, reconstruct the
    symbol vector as

    <eqnarray|<tformat|<table|<row|<cell|<wide|<wide|<with|font-series|bold|x><rprime|'>|^>|^>>|<cell|=>|<cell|<wide|<wide|<with|font-series|bold|b>|^>|^><rsup|<around|(|0|)>><space|0.17em>+<big|sum><rsub|j=1><rsup|N-1>2<rsup|j>*<space|0.17em><space|0.17em><wide|<with|font-series|bold|b>|^><rsup|<around|(|j|)>>.<eq-number>>>>>>

    <item><with|font-shape|italic|Step 4:> Repeat Steps 1 to 3 using
    <math|<wide|<wide|<with|font-series|bold|x><rprime|'>|^>|^>> as the
    initial vector to the RTS algorithm.
  </itemize>

  The algorithm is stopped after a certain number of iterations between RTS
  and BP. Our simulations showed that two iterations between RTS and BP are
  adequate to achieve good improvement; more than two iterations resulted in
  only marginal improvement for the system parameters considered in the
  simulations. Since the complexity of BP part of RTS-BP is less than that of
  the RTS part, the order of complexity of RTS-BP is same as that of RTS,
  <math|O*<around|(|K<rsup|2>*n<rsub|t><rsup|2>|)>>.

  <big-figure|<space|-3mm><image|fig15.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig15>|BER performance comparison between the RTS-BP
  (proposed), RTS, and GTA-BP (in <cite|gta>) in <math|16\<times\>16> V-BLAST
  MIMO with 16-QAM in MIMO-ISI channel with <math|L=6>, <math|K=64>, uniform
  power-delay profile.>

  <subsection|Simulation Results><label|sec>

  Figure <reference|fig15> shows the BER performance of the proposed hybrid
  RTS-BP algorithm in comparison with those of the RTS algorithm and the
  GTA-BP algorithm in <cite|gta> in <math|16\<times\>16> V-BLAST MIMO with
  16-QAM on a frequency selective channel with <math|L=6> equal energy
  multipath components and <math|K=64> data vectors per frame. Because of the
  improvement of the reliability of LSBs due to BP run on them, the RTS-BP
  algorithm achieves better performance compared to RTS algorithm without BP.
  Also, both RTS-BP and RTS algorithms perform better than the GTA-BP in
  <cite|gta>.

  <subsection|Complexity Reduction Using Selective BP><label|sec>

  In the proposed RTS-BP algorithm, the use of BP at the RTS output was done
  unconditionally. Whereas the use of BP can improve performance only when
  the RTS output is erroneous. So, the additional complexity due to BP can be
  avoided if BP is not carried out whenever the RTS output is error-free. To
  decide whether to use BP or not, we can use the knowledge of the simulated
  pdf of the ML cost of the RTS output vector, i.e., the pdf of
  <math|M<rsub|1><Define><around|\<\|\|\>|<with|font-series|bold|r><rprime|'>-<with|font-series|bold|H><rprime|'>*<wide|<with|font-series|bold|x>|^><rprime|'>|\<\|\|\>>>.
  Figure <reference|fig16> shows the simulated pdf of <math|M<rsub|1>> for a
  <math|32\<times\>32> V-BLAST MIMO system with 64-QAM at an SNR of 30 dB on
  flat fading (<math|L=K=1>). From Fig. <reference|fig16>, it is seen that a
  comparison of the value of <math|M<rsub|1>> with a suitable threshold can
  give an indication of the reliability of the RTS output. For example, the
  output is more likely to be erroneous if <math|M<rsub|1>\<gtr\>12> in Fig.
  <reference|fig16>.

  <big-figure|<space|-3mm><image|fig16.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig16>|Simulated pdfs of <math|M<rsub|1>>, the ML cost
  of the RTS output vector, in a <math|32\<times\>32> V-BLAST MIMO system
  with 64-QAM and SNR = 30 dB on flat fading (<math|L=K=1>). >

  Based on the above observation, we modify the RTS-BP algorithm as follows.
  If <math|M<rsub|1>\<gtr\>\<theta\>>, only then BP algorithm is used;
  otherwise, the RTS output is taken as the final output. The threshold
  <math|\<theta\>> has to be carefully chosen to achieve good performance. It
  is seen that <math|\<theta\>=0> corresponds to the case of unconditional
  RTS-BP, and <math|\<theta\>=\<infty\>> corresponds to the case of RTS
  without BP. For <math|\<theta\>=\<infty\>>, there is no additional
  complexity due to BP, but there is no performance gain compared to RTS. For
  <math|\<theta\>=0>, performance gain is possible compared to RTS, but BP
  complexity will be there for all realizations. So there exits a
  performance-complexity trade off as a function of <math|\<theta\>>. We
  illustrate this trade-off in Fig. <reference|fig17> for a
  <math|32\<times\>32> V-BLAST system with 64-QAM in flat fading. For this
  purpose, we define `SNR gain' in dB for a given threshold <math|\<theta\>>
  as the improvement in SNR achieved by RTS with selective BP using threshold
  <math|\<theta\>> to achieve an uncoded BER of <math|10<rsup|-3>> compared
  to RTS without BP. Likewise, we define `complexity gain' for a given
  <math|\<theta\>> as <math|10*log<rsub|10><around|(|\<beta\>|)>>, where
  <math|\<beta\>> is the ratio of the average number of computations required
  to achieve <math|10<rsup|-3>> uncoded BER in unconditional RTS-BP and that
  in RTS with selective BP using threshold <math|\<theta\>>. In Fig.
  <reference|fig17>, we plot these two gains on the y-axis as a function of
  the threshold <math|\<theta\>>. From this figure, we can observe that for
  <math|\<theta\>> values less than 4, there is not much complexity gain
  since such small threshold values invoke BP more often (i.e., the system
  behaves more like unconditional RTS-BP). Similarly, for <math|\<theta\>>
  values greater than 14, the system behaves more like RTS without BP; i.e.,
  the complexity gain is maximum but there is no SNR gain. Interestingly, for
  <math|\<theta\>> values in the range 4 to 14, maximum SNR gain is retained
  while achieving significant complexity gain as well.

  <big-figure|<space|-3mm><image|fig17.eps|3.75in|2.90in||>
  <vspace|-6mm><label|fig17>|SNR gain versus complexity gain trade-off in
  selectively using BP as a function of <math|\<theta\>> in a
  <math|32\<times\>32> V-BLAST MIMO system with 64-QAM at a BER of 0.001 on
  flat fading (<math|L=K=1>). >

  <section|Conclusions><label|sec6>

  In this paper, we demonstrated that belief propagation on graphical models
  including Markov random fields and factor graphs can be efficiently used to
  achieve near-optimal detection in large-dimension MIMO-ISI channels at
  quadratic and linear complexities in <math|K*n<rsub|t>>. It was shown
  through simulations that damping of messages/beliefs in the MRF BP
  algorithm can significantly improve the BER performance and convergence
  behavior. The Gaussian approximation of interference we adopted in the
  factor graph approach is novel, which offered the attractive linear
  complexity in number of dimensions while achieving near-optimal performance
  in large dimensions. In higher-order QAM, iterations between a tabu search
  algorithm and the proposed FG-GAI BP algorithm was shown to improve the bit
  error performance of the basic tabu search algorithm. Although we have
  demonstrated the proposed algorithms in uncoded systems, they can be
  extended to coded systems as well, using either turbo equalization or joint
  processing of the entire coded symbol frame based on low-complexity
  graphical models. Finally, a theoretical analysis of the convergence
  behavior and the bit error performance of the proposed BP algorithms is
  challenging, and remains to be studied.

  <\thebibliography|1>
    <bibitem|fosc98>G. J. Foschini and M. J. Gans, \POn limits of wireless
    communications in a fading environment when using multiple antennas,\Q
    <with|font-shape|italic|Wireless Pers. Commun.,> vol. 6, pp. 311-335,
    March 1998.

    <bibitem|tela99>I. E. Telatar, \PCapacity of multi-antenna Gaussian
    channels,\Q <with|font-shape|italic|European Trans. on Telecommun.,> vol.
    10, no. 6, pp. 585-595, November 1999.

    <bibitem|paulraj>A. Paulraj, R. Nabar, and D. Gore,
    <with|font-shape|italic|Introduction to Space-Time Wireless
    Communications>, Cambridge University Press, 2003.

    <bibitem|proakis>J. G. Proakis, <with|font-shape|italic|Digital
    Communications>, 4th Ed., Mc-Graw Hill, 2001.

    <bibitem|ngoc>X. Shen, M. Guizani, R. C. Qiu, and T. Le-Ngoc,
    <with|font-shape|italic|Ultra-wideband Wireless Communications and
    Networks,> John Wiley & Sons, 2006.

    <bibitem|uwb0>A. F. Molisch, J. R. Foerster, M. Pendergrass, \PChannel
    models for ultrawideband personal area networks,\Q
    <with|font-shape|italic|IEEE Wireless Commun.>, vol. 10, no. 6, pp.
    14\U21, December 2003.

    <bibitem|uwb1>A. F. Molisch, \PUltrawideband propagation channels -
    Theory, measurement, and modeling,\Q <with|font-shape|italic|IEEE Trans.
    on Veh. Tech.,> vol. 54, no. 5, pp. 1528-1545, September 2005.

    <bibitem|uwb2>J. Karedal, S. Wyne, P. Almers, F. Tufvesson, and A. F.
    Molisch, \PStatistical analysis of the UWB channel in an industrial
    environment,\Q <with|font-shape|italic|Proc. IEEE VTC'2004-Fall,>, pp.
    81-85, September 2004.

    <bibitem|uwb3>R. Saadane and A. M. Hayar, \PDRB1.3 third report on UWB
    channel models,\Q http://www.eurecom.fr/util/publidownload.fr.htm?id=2112,
    Newcom, November 2006.

    <bibitem|frey>B. J. Frey, <with|font-shape|italic|Graphical Models for
    Machine Learning and Digital Communication,> Cambridge: MIT Press, 1998.

    <bibitem|merl>J. S. Yedidia, W. T. Freeman, Y. Weiss, \PUnderstanding
    belief propagation and its generalizations,\Q
    <with|font-shape|italic|MERL Tech. Rep. TR-2001-22>, January 2002.

    <bibitem|bp_turbo>R. J. McEliece and D. J. C. MacKay, and J-F. Cheng,
    \PTurbo decoding as an instance of Pearl's belief propagation
    algorithm,\Q <with|font-shape|italic|IEEE Jl. Sel. Areas in Commun.,>
    vol. 16, no.2, pp. 140-152, February 1998.

    <bibitem|ldpc>D. J. C. MacKay, \PGood error-correcting codes based on
    very sparse matrices,\Q <with|font-shape|italic|IEEE Trans. on Inform.
    Theory>, vol. 45, no. 2, pp. 399-431, March 1999.

    <bibitem|bpmud0>Y. Kabashima, \PA CDMA multiuser detection algorithm on
    the basis of belief propagation,\Q <with|font-shape|italic|Journal of
    Physics A: Mathematical and General>, pp. 11111-11121, October 2003.

    <bibitem|bpmud1>A. Montanari, B. Prabhakar, and D. Tse, \PBelief
    propagation based multiuser detection,\Q Online arXiv:cs/0510044v2
    [cs.IT] 22 May 2006.

    <bibitem|bpmud2>D. Guo and C-C. Wang, \PMultiuser detection of sparsely
    spread CDMA,\Q <with|font-shape|italic|IEEE JSAC Spl. Iss. on Multiuser
    Detection, for Adv. Commun. Systems and Networks>, vol. 26, no. 3, pp.
    421-431, April 2008.

    <bibitem|ieee06>J. Soler-Garrido, R. J. Piechocki, K. Maharatna, and D.
    McNamara, \PAnalog MIMO detection on the basis of belief propagation,\Q
    <with|font-shape|italic|Proc. IEEE Mid-West Symp. on Circuits and
    Systems, 2006>.

    <bibitem|icc07>X. Yang, Y. Xiong, F. Wang, \PAn adaptive MIMO system
    based on unified belief propagation detection,\Q
    <with|font-shape|italic|Proc. IEEE ICC'2007>, June 2007.

    <bibitem|bp_isit09>Madhekar Suneel, Pritam Som, A. Chockalingam, and B.
    Sundar Rajan, \PBelief propagation based decoding of large non-orthogonal
    STBCs,\Q <with|font-shape|italic|Proc. IEEE ISIT'2009>, Seoul, July 2009.

    <bibitem|itw10>P. Som, T. Datta, A. Chockalingam, and B. S. Rajan,
    \PImproved large-MIMO detection based on damped belief propagation,\Q
    <with|font-shape|italic|Proc. IEEE Inform. Theory Workshop (ITW'2010)>,
    Cairo, January 2010.

    <bibitem|douil_95>C. Douillard, M. Jezequel, and C. Berrou, \PIterative
    correction of intersymbol interference: Turbo equalization,\Q
    <with|font-shape|italic|European Trans. on Telecommunications>, vol. 6,
    pp. 507-511, September-October 1995.

    <bibitem|turbo_eq>M. Tuchler, R. Koetter, and A. C. Singer, \PTurbo
    Equalization: Principles and New Results,\Q <with|font-shape|italic|IEEE
    Trans. on Commun.,> vol. 50, no. 5, pp. 754-767, May 2002.

    <bibitem|teq_mag>R. Koetter, A. C. Singer, and M. Tuchler, \PTurbo
    equalization,\Q <with|font-shape|italic|IEEE Sig. Process. Mag.,> pp.
    67-80, January 2004.

    <bibitem|fg_sp>F. R. Kschischang, B. J. Frey, and H.-A. Loeliger,
    \PFactor graphs and the sum-product algorithm,\Q
    <with|font-shape|italic|IEEE Trans. on Inform. Theory>, vol. 47, no. 2,
    pp. 498-519, February 2001

    <bibitem|euro_04>M. Tutchler, R. Koetter, and A. C. Singer, \PGraphical
    models for coded data transmission over inter-symbol interference
    channels,\Q <with|font-shape|italic|European Trans. on
    Telecommunications>, vol. 5, no. 4, July/August 2004.

    <bibitem|isi1>O. Shental, A. J. Weiss, N. Shental, and Y. Weiss,
    \PGeneralized belief propagation receiver for near-optimal detection of
    two-dimensional channels with memory,\Q <with|font-shape|italic|Proc.
    IEEE Inform. Theory Workshop>, pp. 225-229, October 2004.

    <bibitem|isi2>G. Colavolpe and G. Germi, \POn the application of factor
    graphs and the sum-product algorithm to ISI channels,\Q
    <with|font-shape|italic|IEEE Trans. on Commun.,> vol. 53, no. 5, pp.
    818-825, May 2005.

    <bibitem|fg_eq>R. J. Drost and A. C. Singer, \PFactor graph algorithms
    for equalization,\Q <with|font-shape|italic|IEEE Trans. on Sig.
    Process.>, vol. 55, no. 5, pp. 2052-2065, May 2007.

    <bibitem|mimo_isi>M. N. Kaynak, T. M. Duman, and E. M. Kurtas, \PBelief
    propagation over MIMO frequency selective fading channels,\Q
    <with|font-shape|italic|Proc. Joint Intl. Conf. on Autonomic and
    Autonomous Systems and Intl. Conf. on Networking and Services>, Papeete,
    Tahiti, October 2005.

    <bibitem|wo>T. Wo and P. A. Hoeher, \PA simple iterative Gaussian
    detector for severely delay-spread MIMO channels,\Q
    <with|font-shape|italic|IEEE ICC'2007>, June 2007.

    <bibitem|isi_icc2010>Pritam Som and A. Chockalingam, \PDamped belief
    propagation based near-optimal equalization of severely delay-spread UWB
    MIMO-ISI channels,\Q accepted in <with|font-shape|italic|IEEE ICC'2010>,
    Cape Town, May 2010.

    <bibitem|tabu1>F. Glover, \PTabu Search - Part I,\Q
    <with|font-shape|italic|ORSA Jl. of Computing,> vol. 1, no. 3, Summer
    1989, pp. 190-206.

    <bibitem|tabu2>F. Glover, \PTabu Search - Part II,\Q
    <with|font-shape|italic|ORSA Jl. of Computing,> vol. 2, no. 1, Winter
    1990, pp. 4-32.

    <bibitem|isi_gcom09>N. Srinidhi, Saif K. Mohammed, and A. Chockalingam,
    \PA reactive tabu search based equalizer for severely delay-spread UWB
    MIMO-ISI channels,\Q <with|font-shape|italic|Proc. IEEE GLOBECOM'2009>,
    Honolulu, December 2009.

    <bibitem|Pearl1988>J. Pearl, <with|font-shape|italic|Probabilistic
    Reasoning in Intelligent Systems: Networks of Plausible Inference>,
    Morgan Kaufmann, San Mateo, California, 1988.

    <bibitem|mooij3>J. M. Mooij, <with|font-shape|italic|Understanding and
    Improving Belief Propagation>, Ph.D Thesis, Radboud University Nijmegen,
    May 2008.

    <bibitem|mooij2>J. M. Mooij and H. J. Kappen, \PSufficient conditions for
    convergence of the sum-product algorithm,\Q <with|font-shape|italic|IEEE
    Trans. on Inform. Theory,> vol. 53, no. 12, pp. 4422-4437, December 2007.

    <bibitem|Heskes>T. Heskes, K. Albers, and B. Kappen, \PApproximate
    inference and constrained optimization,\Q <with|font-shape|italic|Proc.
    Uncertainty in AI>, August 2003.

    <bibitem|yuille>A. L. Yuille, \PA double-loop algorithm to minimize Bethe
    and Kikuchi free energies,\Q <with|font-shape|italic|Neural Computation>,
    2002.

    <bibitem|damp>M. Pretti, \PA message passing algorithm with damping,\Q
    <with|font-shape|italic|Jl. Stat. Mech.: Theory and Practice>, November
    2005.

    <bibitem|loopybp6>T. Heskes, \POn the uniqueness of loopy belief
    propagation fixed points,\P <with|font-shape|italic|Neural Computation>,
    vol. 16, no. 11, pp. 2379-2413, November 2004.

    <bibitem|bsr>B. A. Sethuraman, B. Sundar Rajan, V. Shashidhar,
    \PFull-diversity high-rate space-time block codes from division
    algebras,\Q <with|font-shape|italic|IEEE Trans. on Inform. Theory>, vol.
    49, no. 10, pp. 2596-2616, October 2003.

    <bibitem|blld>Y. Jiang, R. Koetter, and A. C. singer, \POn the
    separability of demodulation and decoding for communications over
    multiple-antenna block-fading channels,\Q <with|font-shape|italic|IEEE
    Trans. on Inform. Theory>, vol. 49, no. 10, pp. 2709-2713, October 2003.

    <bibitem|cpsc1>H. Sari, G. Karam, and I. Jeanclaude, \PTransmission
    techniques for digital terrestrial TV broadcasting,\Q
    <with|font-shape|italic|IEEE Commun. Mag.,> vol. 33, no. 2, pp. 100-109,
    February 1995.

    <bibitem|cpsc2>D. Falconer, S. L. Ariyavisitakul, A. Benyamin-Seeyar, and
    B. Eidson, \PFrequency domain equalization for single-carrier broadband
    wireless systems,\Q <with|font-shape|italic|IEEE Commun. Mag.,> pp.
    58-66, April 2002.

    <bibitem|cpsc3>B. Devillers, J. Louveaux, and L. Vandendorpe, \PAbout the
    diversity in cyclic prefixed single-carrier systems,\Q
    <with|font-shape|italic|Physical Communications>, pp. 266-276, 2008.

    <bibitem|griffeath>D. Griffeath, <with|font-shape|italic|Introduction to
    Markov Random Fields>, Springer, 1976.

    <bibitem|foundwain>M. J. Wainwright and M, I. Jordan,
    <with|font-shape|italic|Graphical Models, Exponential Families, and
    Variational Inference>, vol. 1, no. 1-2, pp. 1-305, Now Publisher, 2008.

    <bibitem|gta>J. Goldberger and A. Leshem, \PMIMO detection for high-order
    QAM based on a Gaussian tree approximation,\Q
    <with|font-shape|italic|arXiv:1001.5364v1[cs.IT] 29 Jan 2010>.
  </thebibliography>
</body>