<TeXmacs|1.99.7>

<style|<tuple|ieeetran|std-latex>>

<\body>
  <\hide-preamble>
    <assign|vect|<macro|1|<with|math-font-series|bold|<arg|1>>>>

    <assign|R|<macro|<with|font-shape|slanted|mode|text|I<kern>-.2emR>>>

    <assign|n|<macro|<wide|n|\<vect\>>>>

    <assign|m|<macro|<wide|m|\<vect\>>>>

    <assign|p|<macro|<wide|p|\<vect\>>>>

    <assign|q|<macro|<wide|q|\<vect\>>>>

    <assign|r|<macro|<wide|r|\<vect\>>>>

    <assign|s|<macro|<wide|s|\<vect\>>>>

    <assign|eps|<macro|\<varepsilon\>>>

    <assign|si|<macro|\<sigma\><rsup|2>>>

    <assign|sumi|<macro|<big|sum><rsub|i=1><rsup|N>>>

    <assign|v|<macro|<wide|v|\<vect\>>>>

    <assign|a|<macro|\<alpha\>>>

    <assign|b|<macro|1|2|\<beta\><rsub|<arg|1>,<arg|2>>>>

    <assign|bvec|<macro|<wide|\<beta\>|\<vect\>>>>

    <assign|abs|<macro|1|\|<arg|1>\|>>

    <assign|half|<macro|<frac|1|2>>>

    <assign|sqr|<macro|<frac|1|<sqrt|2>>>>

    <assign|N|<macro|\<bbb-N\>>>
  </hide-preamble>

  <doc-data|<doc-title|Online Adaptive Decision Fusion Framework Based on
  Entropic Projections onto Convex Sets with Application to Wildfire
  Detection in Video>|<doc-author|<author-data|<author-inst|1>|<author-inst|2>|<author-inst|1>|<author-inst|1>|<author-name|Osman<nbsp>Gnay,
  Behcet<nbsp>Uur<nbsp>Treyin, K\Yvan<nbsp>Kse and
  A.<nbsp>Enis<nbsp>etin<next-line>>|<author-data>|<author-inst|1>|<author-affiliation|Department
  of Electrical and Electronics Engineering<next-line>Bilkent University,
  Ankara, Turkey, 06800<next-line>Telephone: +90-312-290-1219 Fax:
  +90-312-266-4192<next-line>Email: {osman,kkivanc,cetin}@ee.bilkent.edu.tr>>>|<doc-date|<date|>>>

  <abstract-data|<\abstract>
    In this paper, an Entropy functional based online Adaptive Decision
    Fusion (EADF) framework is developed for image analysis and computer
    vision applications. In this framework, it is assumed that the compound
    algorithm consists of several sub-algorithms each of which yielding its
    own decision as a real number centered around zero, representing the
    confidence level of that particular sub-algorithm. Decision values are
    linearly combined with weights which are updated online according to an
    active fusion method based on performing entropic projections onto convex
    sets describing sub-algorithms. It is assumed that there is an oracle,
    who is usually a human operator, providing feedback to the decision
    fusion method. A video based wildfire detection system is developed to
    evaluate the performance of the algorithm in handling the problems where
    data arrives sequentially. In this case, the oracle is the security guard
    of the forest lookout tower verifying the decision of the combined
    algorithm. Simulation results are presented. The EADF framework is also
    tested with a standard dataset.
  </abstract>>

  <with|font-series|bold|(>EDICS:) ARS-IIU:Image & Video Interpretation and
  Understanding Object recognition and classification; Foreground/background
  segregation; Scene analysis

  <section|Introduction>

  <PARstart|A|n> online learning framework called Entropy functional based
  Adaptive Decision Fusion (EADF) is proposed which can be used for various
  image analysis and computer vision applications. In this framework, it is
  assumed that the compound algorithm consists of several sub-algorithms each
  of which yielding its own decision. The final decision is taken based on a
  set of real numbers representing confidence levels of various
  sub-algorithms. Decision values are linearly combined with weights which
  are updated online using an active fusion method based on performing
  entropic projections (e-projections) onto convex sets describing
  sub-algorithms.

  Adaptive learning methods based on orthogonal projections are successfully
  used in some computer vision and pattern recognition
  problems<nbsp><cite|osman|Theodoridis>. In this active learning approach
  decisions from different classifiers are combined using a linear
  combiner<nbsp><cite|dasarathy>. A multiple classifier system can prove
  useful for difficult pattern recognition problems especially when large
  class sets and noisy data are involved, because it allows the use of
  arbitrary feature descriptors and classification procedures at the same
  time<nbsp><cite|multipleclassifier>. Instead of determining the weights
  using orthogonal projections as in<nbsp><cite|osman|Theodoridis>, we
  introduce the entropic e-projection approach which is based on a
  generalized projection onto convex set.

  The studies in the field of collective recognition, which were started in
  the middle of the 1950s, found wide application in practice during the last
  decade, leading to solution to complex large-scale applied
  problems<nbsp><cite|collectiverec>. One of the first examples of the use
  multiple classifiers was given by Dasarathy in<nbsp><cite|dasarathy> in
  which he introduced the concept of composite classifier systems as a means
  of achieving improved recognition system performance compared to employing
  the classifier components individually. The method is illustrated by
  studying the case of the linear/NN(Nearest Neighbor) classifier composite
  system. Kumar and Zhang used multiple classifiers for palmprint recognition
  by characterizing the user's identity through the simultaneous use of three
  major palmprint representations and achieve better performance than either
  one individually<nbsp><cite|Kumar2005>. A multiple classifier fusion
  algorithm is proposed for developing an effective video-based face
  recognition method<nbsp><cite|facemultiple>. Garcia and Puig present
  results showing that pixel-based texture classification can be
  significantly improved by integrating texture methods from multiple
  families, each evaluated over multisized
  windows<nbsp><cite|texturemultiple>. This technique consists of an initial
  training stage that evaluates the behavior of each considered texture
  method when applied to the given texture patterns of interest over various
  evaluation windows of different size.

  In this article, the EADF framework is applied to a computer vision based
  wildfire detection problem. The system based on this method is currently
  being used in more than 50 forest fire lookout towers. The proposed
  automatic video based wildfire detection algorithm is based on five
  sub-algorithms: (i)<nbsp>slow moving video object detection,
  (ii)<nbsp>smoke-colored region detection, (iii)<nbsp>wavelet transform
  based region smoothness detection, (iv)<nbsp>shadow detection and
  elimination, (v)<nbsp>covariance matrix based classification. Each
  sub-algorithm decides on the existence of smoke in the viewing range of the
  camera separately. Decisions from sub-algorithms are combined together by
  the adaptive decision fusion method. Initial weights of the sub-algorithms
  are determined from actual forest fire videos and test fires. They are
  updated by using entropic e-projections onto hyperplanes defined by the
  fusion weights. It is assumed that there is an oracle monitoring the
  decisions of the combined algorithm. In the wildfire detection case, the
  oracle is the security guard. Whenever a fire is detected the decision
  should be acknowledged by the security guard. The decision algorithm will
  also produce false alarms in practice. Whenever an alarm occurs the system
  asks the security guard to verify its decision. If it is incorrect the
  weights are updated according to the decision of the security guard. The
  goal of the system is not to replace the security guard but to provide a
  supporting tool to help him or her. The attention span of a typical
  security guard is only 20 minutes in monitoring stations. It is also
  possible to use feedback at specified intervals and run the algorithm
  autonomously at other times. For example, the weights can be updated when
  there is no fire in the viewing range of the camera and then the system can
  be run without feedback.

  The paper is organized as follows: Entropy functional based Adaptive
  Decision Fusion (EADF) framework is described in
  Section<nbsp><reference|sec:weight>. The first part of the section
  describes our previous weight update algorithm which is obtained by
  orthogonal projections onto convex sets<nbsp><cite|osman>, the second part
  proposes entropy based e-projection method for weight update of the
  sub-algorithms. Section<nbsp><reference|sec:wildfire> introduces the video
  based wildfire detection problem. The proposed framework is not restricted
  to the wildfire detection problem. It can be also used in other real-time
  intelligent video analysis applications in which a security guard is
  available. In Section<nbsp><reference|sec:building> each one of the five
  sub-algorithms which make up the compound (main) wildfire detection
  algorithm is described. In Section<nbsp><reference|sec:experimental>,
  experimental results are presented and the proposed online active fusion
  method is compared with the universal linear predictor and the weighted
  majority algorithms. The proposed EADF method is also evaluated on a
  dataset from the UCI machine learning repository<nbsp><cite|uci>.
  Well-known classifiers (SVM, K-NN) are combined using EADF. During the
  training stage individual decisions of classifiers are used to find the
  weight of each classifier in the composite EADF classifier. Finally,
  conclusions are drawn in Section<nbsp><reference|sec:conclusion>.

  <section|Adaptive Decision Fusion (ADF) Framework><label|sec:weight>

  Let the compound algorithm be composed of <math|M>-many detection
  sub-algorithms: <math|D<rsub|1>,...,D<rsub|M>>. Upon receiving a sample
  input <math|x> at time step <math|n>, each sub-algorithm yields a decision
  value <math|D<rsub|i><around|(|x,n|)>\<in\>\<bbb-R\>> centered around zero.
  If <math|D<rsub|i><around|(|x,n|)>\<gtr\>0>, it means that the event is
  detected by the <math|i>-th sub-algorithm. Otherwise, it is assumed that
  the event did not happen. The type of the sample input <math|x> may vary
  depending on the algorithm. It may be an individual pixel, or an image
  region, or the entire image depending on the sub-algorithm of the computer
  vision problem. For example, in the wildfire detection problem presented in
  Section<nbsp><reference|sec:wildfire>, the number of sub-algorithms is
  <math|M>=5 and each pixel at the location <math|x> of incoming image frame
  is considered as a sample input for every detection algorithm.

  Let <math|<with|font-series|bold|D><around|(|x,n|)>=<around|[|D<rsub|1><around|(|x,n|)>,...,D<rsub|M><around|(|x,n|)>|]><rsup|T>>,
  be the vector of decision values of the sub-algorithms for the pixel at
  location <math|x> of input image frame at time step <math|n>, and
  <math|<with|font-series|bold|w><around|(|x,n|)>=<around|[|w<rsub|1><around|(|x,n|)>,...,w<rsub|M><around|(|x,n|)>|]><rsup|T>>
  be the current weight vector. For simplicity we will drop <math|x> in
  <math|<with|font-series|bold|w><around|(|x,n|)>> for the rest of the paper.

  We define

  <\equation>
    <label|eq:y><wide|y|^><around|(|x,n|)>=<with|font-series|bold|D<rsup|T>><around|(|x,n|)><with|font-series|bold|w><around|(|n|)>=<big|sum><rsub|i>w<rsub|i><around|(|n|)>*D<rsub|i><around|(|x,n|)>
  </equation>

  as an estimate of the correct classification result
  <math|y<around|(|x,n|)>> of the oracle for the pixel at location <math|x>
  of input image frame at time step <math|n>, and the error
  <math|e<around|(|x,n|)>> as <math|e<around|(|x,n|)>=y<around|(|x,n|)>-<wide|y|^><around|(|x,n|)>>.
  As it can be seen in the next subsection, the main advantage of the
  proposed algorithm compared to other related methods
  in<nbsp><cite|WMA><nocite|tsmc_xu92><nocite|tsmc_kuncheva02>-<cite|tsmc_ParikhP07>,
  is the controlled feedback mechanism based on the error term. Weights of
  the algorithms producing incorrect (correct) decision is reduced
  (increased) iteratively at each time step. Another advantage of the
  proposed algorithm is that it does not assume any specific probability
  distribution about the data.

  <subsection|Set Theoretic Weight Update Algorithm based on Orthogonal
  Projections>

  In this subsection, we first review the orthogonal projection based weight
  update scheme<nbsp><cite|osman>. Ideally, weighted decision values of
  sub-algorithms should be equal to the decision value of
  <math|y<around|(|x,n|)>> the oracle:

  <\equation>
    <label|eq:hyperplane>y<around|(|x,n|)>=<with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w>
  </equation>

  which represents a hyperplane in the M-dimensional space,
  <math|\<bbb-R\><rsup|M>>. Hyperplanes are closed and convex in
  <math|\<bbb-R\><rsup|M>>. At time instant <math|n>,
  <math|<with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w><around|(|n|)>>
  may not be equal to <math|y<around|(|x,n|)>>. In our approach, the next set
  of weights are determined by projecting the current weight vector
  <math|<with|font-series|bold|w><around|(|n|)>> onto the hyperplane
  represented by Eq.<nbsp><reference|eq:hyperplane>. The orthogonal
  projection <math|<with|font-series|bold|w>*<around|(|n+1|)>> of the vector
  of weights <math|<with|font-series|bold|w><around|(|n|)>\<in\>\<bbb-R\><rsup|M>>
  onto the hyperplane <math|y<around|(|x,n|)>=<with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w>>
  is the closest vector on the hyperplane to the vector
  <math|<with|font-series|bold|w><around|(|n|)>>.

  Let us formulate the problem as a minimization problem:

  <\equation>
    <tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|<label|eq:app><nbsp><nbsp><nbsp><nbsp><nbsp><nbsp>min<rsub|<with|font-series|bold|w<rsup|\<ast\>>>><around|\|||\|><with|font-series|bold|w<rsup|\<ast\>>>-<with|font-series|bold|w><around|(|n|)><around|\|||\|>>>|<row|<cell|<text|subject
    to ><with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w<rsup|\<ast\>>>=y<around|(|x,n|)>>>>>>
  </equation>

  The solution can be obtained by using Lagrange multipliers. If we define
  the next set of weights as <math|<with|font-series|bold|w*<around|(|n+1|)>=w<rsup|\<ast\>>>>
  it can be obtained by the following iteration:

  <\equation>
    <label|eq:app><with|font-series|bold|w>*<around|(|n+1|)>=<with|font-series|bold|w><around|(|n|)>+<frac|\<lambda\>|2>*<with|font-series|bold|D><around|(|x,n|)>
  </equation>

  where the Lagrange multiplier, <math|\<lambda\>>, can be obtained from the
  hyperplane equation:

  <\equation>
    <with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w<rsup|\<ast\>>>-y<around|(|x,n|)>=0
  </equation>

  as follows:

  <\equation>
    <label|eq:app>\<lambda\>=2*<frac|y<around|(|x,n|)>-<wide|y|^><around|(|x,n|)>|<around|\|||\|>*<with|font-series|bold|D><around|(|x,n|)><around|\|||\|><rsup|2>>=2*<frac|e<around|(|x,n|)>|<around|\|||\|>*<with|font-series|bold|D><around|(|x,n|)><around|\|||\|><rsup|2>>
  </equation>

  where the error, <math|e<around|(|x,n|)>>, is defined as
  <math|e<around|(|x,n|)>=y<around|(|x,n|)>-<wide|y|^><around|(|x,n|)>> and
  <math|<wide|y|^><around|(|x,n|)>=<with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w><around|(|n|)>>.
  Plugging this into<nbsp>Eq.<nbsp><reference|eq:app>

  <\equation>
    <label|eq:app><with|font-series|bold|w>*<around|(|n+1|)>=<with|font-series|bold|w><around|(|n|)>+<frac|e<around|(|x,n|)>|<around|\|||\|>*<with|font-series|bold|D><around|(|x,n|)><around|\|||\|><rsup|2>>*<with|font-series|bold|D><around|(|x,n|)>
  </equation>

  is obtained. Hence the projection vector is calculated according to
  Eq.<nbsp><reference|eq:app>.

  Whenever a new input arrives, another hyperplane based on the new decision
  values <math|<with|font-series|bold|D><around|(|x,n|)>> of sub-algorithms,
  is defined in <math|\<bbb-R\><rsup|M>>

  <\equation>
    <label|eq:hyperplane>y*<around|(|x,n+1|)>=<with|font-series|bold|D><rsup|T>*<around|(|x,n+1|)><with|font-series|bold|w<rsup|\<ast\>>>
  </equation>

  This hyperplane will not be the same as
  <math|y<around|(|x,n|)>=<with|font-series|bold|D><rsup|T><around|(|x,n|)><with|font-series|bold|w><around|(|n|)>>
  hyperplane. The next set of weights, <math|<with|font-series|bold|w>*<around|(|n+2|)>>,
  are determined by projecting <math|<with|font-series|bold|w>*<around|(|n+1|)>>
  onto the hyperplane in Eq.<nbsp><reference|eq:hyperplane>. Iterated weights
  converge to the intersection of hyperplanes<nbsp><cite|cetin_pocs>,<nbsp><cite|Combettes>.
  The rate of convergence can be adjusted by introducing a relaxation
  parameter <math|\<mu\>> to Eq.<nbsp><reference|eq:app> as follows

  <\equation>
    <label|eq:app><with|font-series|bold|w>*<around|(|n+1|)>=<with|font-series|bold|w><around|(|n|)>+\<mu\><frac|e<around|(|x,n|)>|<around|\|||\|>*<with|font-series|bold|D><around|(|x,n|)><around|\|||\|><rsup|2>>*<with|font-series|bold|D><around|(|x,n|)>
  </equation>

  where <math|0\<less\>\<mu\>\<less\>2> should be satisfied to guarantee the
  convergence according to the projections onto convex sets<nbsp>(POCS)
  theory<nbsp><cite|pocs_theory><nocite|cetin_pocs2><nocite|pocs_theory><nocite|Theodoridis1>-<cite|Wornell>.

  If the intersection of hyperplanes is an empty set, then the updated weight
  vector simply satisfies the last hyperplane equation. In other words, it
  tracks decisions of the oracle by assigning proper weights to the
  individual sub-algorithms<nbsp><cite|cetin_pocs2>,<nbsp><cite|Theodoridis1>.

  The relation between support vector machines and orthogonal projections
  onto halfplanes was established in<nbsp><cite|Theodoridis1>,<nbsp><cite|Theodoridis2>
  and<nbsp><cite|Theodoridis3>. As pointed out in<nbsp><cite|Theodoridis2>
  SVM is very successful in batch settings but it cannot handle online
  problems with drifting concepts in which the data arrive sequentially.

  <\specified-algorithm|The pseudo-code for the POCS based algorithm>
    \ \ <algo-state|Adaptive Decision Fusion(x,n)>

    <\algo-for|<math|i> = 1 to M>
      <algo-state|<math|w<rsub|i><around|(|0|)>> =
      <math|<frac|1|M>,I*n*i*t*i*a*l*i*z*a*t*i*o*n>>
    </algo-for>

    <algo-state|<math|e<around|(|x,n|)>=y<around|(|x,n|)>-<wide|y|^><around|(|x,n|)>>>

    <\algo-for|<math|i> = 1 to M>
      <algo-state|<math|w<rsub|i>*<around|(|n+1|)>\<leftarrow\>w<rsub|i><around|(|n|)>+\<mu\><frac|e<around|(|x,n|)>|<around|\|||\|>*<with|font-series|bold|D><around|(|x,n|)><around|\|||\|><rsup|2>>*D<rsub|i><around|(|x,n|)>>>
    </algo-for>

    <algo-state|<math|<wide|y|^><around|(|x,n|)>=<big|sum><rsub|i>w<rsub|i><around|(|n|)>*D<rsub|i><around|(|x,n|)>>>

    <algo-if-else-if|<math|<wide|y|^><around|(|x,n|)>\<geq\>0>|<algo-state|return
    1>|<algo-state|return -1>>

    <label|fig:algo>
  </specified-algorithm>

  <subsection|Entropic Projection (E-Projection) Based Weight Update
  Algorithm><label|Algorithm>

  The <math|l<rsub|1>> norm based minimization approaches provide successful
  signal reconstruction results in compressive sensing
  problems<nbsp><cite|baraniuk|candes|osher|osher2>. However the
  <math|l<rsub|0>> and <math|l<rsub|1>> norm based cost functions used in
  compressive sensing problems are not differentiable everywhere. The entropy
  functional approximates the <math|l<rsub|1>> norm
  <math|<big|sum><rsub|i><around|\||w<rsub|i><around|(|n|)>|\|>> for
  <math|w<rsub|i><around|(|n|)>\<gtr\>0><nbsp><cite|Bregman>. Therefore it
  can be used to find approximate solutions to the inverse problems defined
  in <cite|baraniuk|candes> and other application requiring <math|l<rsub|1>>
  norm minimization. Bregman developed convex optimization algorithms in
  1960's and his algorithms are widely used in many signal reconstruction and
  inverse problems <cite|pocs_theory|Herman|Lent|Trussell|Sezan|Combettes|enis2|enis2|enis4|Theodoridis>.
  Bregman's method provides globally convergent iterative algorithms for
  problems with convex, continuous and differentiable cost functionals
  <math|g<around|(|.|)>>:

  <\equation>
    min<rsub|<math-bf|w>\<in\>C> <space|0.27em>g<around|(|<math-bf|w>|)><label|Dg>
  </equation>

  such that

  <\equation>
    <math-bf|D><rsup|T><around|(|x,n|)><math-bf|w><around|(|n|)>=y*<text|for
    each time index >n<label|hyperplane>
  </equation>

  In the EADF framework the cost function is
  <math|g<around|(|<math-bf|w>|)>=<big|sum><rsub|i><rsup|M>w<rsub|i><around|(|n|)>*l*o*g<around|(|w<rsub|i><around|(|n|)>|)>>
  and each equation in (<reference|hyperplane>) represents a hyperplane
  <math|H<rsub|n>\<in\>\<bbb-R\><rsup|M>> which are closed and convex sets.
  In Bregman's method the iterative algorithm starts with an arbitrary
  initial estimate and successive e-projections are performed onto the
  hyperplanes <math|H<rsub|n>>, <math|n=1,2,...,N> in each step of the
  iterative algorithm.

  The e-projection onto a closed and convex set is a generalized version of
  the orthogonal projection onto a convex set<nbsp><cite|Bregman>. Let
  <math|<math-bf|w><around|(|n|)>> denote the weight vector for the
  <math|n<rsub|t*h>> sample. Its e-Projection
  <math|<math-bf|w><rsup|\<ast\>>> onto a closed convex set <math|C> with
  respect to a cost functional <math|g<around|(|<math-bf|w>|)>> is defined as
  follows

  <\equation>
    <math-bf|w><rsup|\<ast\>>=arg min<rsub|<math-bf|w>\<in\>C>
    <space|0.27em>L<around|(|<math-bf|w>,<math-bf|w><around|(|n|)>|)><label|Dproj>
  </equation>

  where

  <\equation>
    L<around|(|<math-bf|w>,<math-bf|w><around|(|n|)>|)>=g<around|(|<math-bf|w>|)>-g<around|(|<math-bf|w><around|(|n|)>|)>-\<less\><big|triangledown>g<around|(|<math-bf|w>|)>,<math-bf|w>-<math-bf|w><around|(|n|)>\<gtr\><label|Dproj1>
  </equation>

  In the adaptive learning problem, we have the hyperplane
  <math|H:<math-bf|D><rsup|T><around|(|x,n|)>.<math-bf|w><around|(|n+1|)>=y<around|(|x,n|)><space|1em>>
  for each sample <math|x>. For each hyperplane <math|H>, the e-projection
  (<reference|Dproj>) is equivalent to

  <eqnarray|<tformat|<table|<row|<cell|<big|triangledown>g<around|(|<math-bf|w><around|(|n+1|)>|)>=<big|triangledown>g<around|(|<math-bf|w><around|(|n|)>|)>+\<lambda\><with|math-font-family|bf|D<around|(|x,n|)>><eq-number>>>|<row|<cell|<math-bf|D><rsup|T><around|(|x,n|)>.<math-bf|w><around|(|n+1|)>=y<around|(|x,n|)><eq-number><label|eq15>>>>>>

  where <math|\<lambda\>> is the Lagrange multiplier. As pointed above the
  e-projection is a generalization of the orthogonal projection. When the
  cost functional is the Euclidean cost functional
  <math|g<around|(|<math-bf|w>|)>=<big|sum><rsub|i>w<rsub|i><around|(|n|)><rsup|2>>
  the distance <math|L<around|(|<with|math-font-family|bf|w<rsub|1>>,<with|math-font-family|bf|w<rsub|2>>|)>>
  becomes the <math|l<rsub|2>> norm square of the difference vector
  <math|<around|(|<with|math-font-family|bf|w<rsub|1>>-<with|math-font-family|bf|w<rsub|2>>|)>>,
  and the e-projection simply becomes the well-known orthogonal projection
  onto a hyperplane.

  When the cost functional is the entropy functional
  <math|g<around|(|<math-bf|w>|)>=<big|sum><rsub|i>w<rsub|i><around|(|n|)>*log
  <around|(|w<rsub|i><around|(|n|)>|)>>, the e-projection onto the hyperplane
  <math|H> leads to the following update equations:

  <\equation>
    w<rsub|i>*<around|(|n+1|)>=w<rsub|i><around|(|n|)>*e<rsup|\<lambda\>*D<rsub|i><around|(|x,n|)>>,<space|0.27em>i=1,2,...,M<label|entropicupdate1>
  </equation>

  where the Lagrange multiplier <math|\<lambda\>> is obtained by inserting
  (<reference|entropicupdate1>) into the hyperplane equation:

  <\equation>
    <math-bf|D><rsup|T><around|(|x,n|)><math-bf|w><around|(|n+1|)>=y<around|(|x,n|)><label|entropicupdate2>
  </equation>

  because the e-projection <math|<math-bf|w><around|(|n+1|)>> must be on the
  hyperplane <math|H> in Eq.<nbsp><reference|eq15>. This globally convergent
  iterative process is depicted in Fig.<nbsp><reference|fig:geo>.

  <\big-figure>
    <frame|<image|geo_int.eps|8.0cm|||>><label|fig:geo>
  </big-figure|Geometric interpretation of the entropic projection method:
  Weight vectors corresponding to decision functions at each frame are
  updated as to satisfy the hyperplane equations defined by the oracle's
  decision <math|y<around|(|x,n|)>> and the decision vector
  <math|<with|font-series|bold|D><around|(|x,n|)>>. Lines in the figure
  represent hyperplanes in <math|\<bbb-R\><rsup|M>>. Weight update vectors
  converge to the intersection of the hyperplanes. Notice that e-projections
  are not orthogonal projections.>

  The above set of equations are used in signal reconstruction from Fourier
  Transform samples and the tomographic reconstruction
  problem<nbsp><cite|Herman|cetin_pocs2>. The entropy functional is defined
  only for positive real numbers which coincides with our positive weight
  assumption.

  The pseudo-code for the e-projection based adaptive decision fusion based
  algorithm is given in Algorithm<nbsp><reference|fig:algent>. To find the
  <math|\<lambda\>> value that minimizes the squared error at each iteration
  a simple search between possible <math|\<lambda\>> values can be performed
  or a nonlinear equation should be solved
  (Eqs.<nbsp><reference|entropicupdate1> and<nbsp><reference|entropicupdate2>).

  <\specified-algorithm|The pseudo-code for the EADF algorithm>
    \ \ <algo-state|E-Projection Based Adaptive Decision Fusion(x,n)>

    <\algo-for|<math|i> = 1 to M>
      <algo-state|<math|w<rsub|i><around|(|0|)>> =
      <math|<frac|1|M>,I*n*i*t*i*a*l*i*z*a*t*i*o*n>>
    </algo-for>

    <\algo-for|<math|\<lambda\>> = <math|\<lambda\><rsub|m*i*n>> to
    <math|\<lambda\><rsub|m*a*x>>>
      <\algo-for|<math|i> = 1 to M>
        <algo-state|<math|v<rsub|i><around|(|n|)>=w<rsub|i><around|(|n|)>>>

        <algo-state|<math|v<rsub|i>*<around|(|n+1|)>\<leftarrow\>v<rsub|i><around|(|n|)>*e<rsup|\<lambda\>*D<rsub|i><around|(|x,n|)>>>>
      </algo-for>

      <algo-if-else-if|<math|<around|\|||\|>*y<around|(|x,n|)>-<big|sum><rsub|i>v<rsub|i>*<around|(|n+1|)>*D<rsub|i><around|(|x,n|)><around|\|||\|>\<less\><around|\|||\|>*y<around|(|x,n|)>-<big|sum><rsub|i>v<rsub|i><around|(|n|)>*D<rsub|i><around|(|x,n|)><around|\|||\|>>
      |<algo-state|<math|<with|math-font-family|bf|w<rsub|T>><around|(|n+1|)>\<leftarrow\><math-bf|v><around|(|n+1|)>>>>
    </algo-for>

    <algo-state|<math|<math-bf|w><around|(|n+1|)>\<leftarrow\><with|math-font-family|bf|w<rsub|T>><around|(|n+1|)>>>

    <algo-state|<math|<wide|y|^><around|(|x,n|)>=<big|sum><rsub|i>w<rsub|i><around|(|n|)>*D<rsub|i><around|(|x,n|)>>>

    <algo-if-else-if|<math|<wide|y|^><around|(|x,n|)>\<geq\>0>|<algo-state|return
    1>|<algo-state|return -1>>

    <label|fig:algent>
  </specified-algorithm>

  <section|An Application: Computer Vision Based Wildfire
  Detection><label|sec:wildfire>

  The Entropy function based Adaptive Decision Fusion (EADF) framework
  described in detail in the previous section with tracking capability is
  especially useful when the online active learning problem is of dynamic
  nature with drifting concepts<nbsp><cite|schlimmer><nocite|Polikar_1>-<cite|SMC_Conf3>.
  In video based wildfire detection problem introduced in this section, the
  nature of forestal recordings vary over time due to weather conditions and
  changes in illumination which makes it necessary to deploy an adaptive
  wildfire detection system. It is not feasible to develop one strong fusion
  model with fixed weights in this setting with drifting nature. An ideal
  online active learning mechanism should keep track of drifts in video and
  adapt itself accordingly. The projections in
  Eq.<nbsp><reference|entropicupdate1> and Eq.<nbsp><reference|eq:app> adjust
  the importance of individual sub-algorithms by updating the weights
  according to the decisions of the oracle.

  Manned lookout posts are widely available in forests all around the world
  to detect wild fires. Surveillance cameras can be placed in these
  surveillance towers to monitor the surrounding forestal area for possible
  wild fires. Furthermore, they can be used to monitor the progress of the
  fire from remote centers.

  As an application of EADF, a computer vision based method for wildfire
  detection is presented in this article. Security guards have to work 24
  hours in remote locations under difficult circumstances. They may simply
  get tired or leave the lookout tower for various reasons. Therefore,
  computer vision based video analysis systems capable of producing automatic
  fire alarms are necessary to help the security guards to reduce the average
  forest fire detection time.

  Cameras, once installed, operate at forest watch towers throughout the fire
  season for about six months which is mostly dry and sunny in Mediterranean
  region. There is usually a guard in charge of the cameras, as well. The
  guard can supply feed-back to the detection algorithm after the
  installation of the system. Whenever an alarm is issued, she/he can verify
  it or reject it. In this way, she/he can participate to the learning
  process of the adaptive algorithm. The proposed active fusion algorithm can
  be also used in other supervised learning problems where classifiers
  combinations through feedback is required.

  As described in the following section, the main wildfire detection
  algorithm is composed of five sub-algorithms. Each algorithm has its own
  decision function yielding a zero-mean real number for slow moving regions
  at every image frame of a video sequence. Decision values from
  sub-algorithms are linearly combined and weights of sub-algorithms are
  adaptively updated in our approach.

  There are several approaches on automatic forest fire detection in the
  literature. Some of the approaches are directed towards detection of the
  flames using infra-red and/or visible-range cameras and some others aim at
  detecting the smoke due to wildfire<nbsp><cite|ollero2008><nocite|Li2005><nocite|bosch2007><nocite|demirel>-<cite|fransizlarOE>.
  There are recent papers on sensor based fire
  detection<nbsp><cite|Hefeeda><nocite|Sahin>-<cite|SMC_Conf1>. Infrared
  cameras and sensor based systems have the ability to capture the rise in
  temperature however they are much more expensive compared to regular
  pan-tilt-zoom (PTZ) cameras. An intelligent space framework is described
  for indoor fire detection in<nbsp><cite|SMC_Conf2>. However, in this paper,
  an outdoor (forest) wildfire detection method is proposed.

  It is almost impossible to view flames of a wildfire from a camera mounted
  on a forest watch tower unless the fire is very near to the tower. However,
  smoke rising up in the forest due to a fire is usually visible from long
  distances. A snapshot of a typical wildfire smoke captured by a lookout
  tower camera from a distance of 5<nbsp>km is shown in
  Fig.<nbsp><reference|fig:snapshot>.

  Guillemant and Vicente<nbsp><cite|fransizlarOE> based their method on the
  observation that the movements of various patterns like smoke plumes
  produce correlated temporal segments of gray-level pixels. They utilized
  fractal indexing using a space-filling Z-curve concept along with
  instantaneous and cumulative velocity histograms for possible smoke
  regions. They made smoke decisions about the existence of smoke according
  to the standard deviation, minimum average energy, and shape and smoothness
  of these histograms. It is possible to include most of the currently
  available methods as sub-algorithms in the proposed framework and combine
  their decisions using the proposed EADF method.

  <big-figure|<image|snapshot_2.eps|8.5cm|||><label|fig:snapshot>|Snapshot of
  a typical wildfire smoke captured by a forest watch tower which is
  5<nbsp>km away from the fire (rising smoke is marked with an arrow).>

  Smoke at far distances (<math|\<gtr\>100><nbsp>m to the camera) exhibits
  different spatio-temporal characteristics than nearby smoke and
  fire<nbsp><cite|icip05><nocite|icassp05>-<cite|patrec>. This demands
  specific methods explicitly developed for smoke detection at far distances
  rather than using nearby smoke detection methods described
  in<nbsp><cite|eusipco2005>. The proposed approach is in accordance with the
  `weak' Artificial Intelligence (AI) framework<nbsp><cite|Pavlidis>
  introduced by Hubert<nbsp>L.<nbsp>Dreyfus as opposed to `generalized' AI.
  According to this framework each specific problem in AI should be addressed
  as an individual engineering problem with its own
  characteristics<nbsp><cite|Dreyfus72>,<nbsp><cite|Dreyfus92>.

  <section|Building Blocks of Wildfire Detection
  Algorithm><label|sec:building>

  Wildfire detection algorithm is developed to recognize the existence of
  wildfire smoke within the viewing range of the camera monitoring forestal
  areas. The proposed wildfire smoke detection algorithm consists of five
  main sub-algorithms: (i)<nbsp>slow moving object detection in video,
  (ii)<nbsp>smoke-colored region detection, (iii)<nbsp>wavelet transform
  based region smoothness detection, (iv)<nbsp>shadow detection and
  elimination, (v)<nbsp>covariance matrix based classification, with decision
  functions, <math|D<rsub|1><around|(|x,n|)>>,
  <math|D<rsub|2><around|(|x,n|)>>, <math|D<rsub|3><around|(|x,n|)>>,
  <math|D<rsub|4><around|(|x,n|)>> and <math|D<rsub|5><around|(|x,n|)>>,
  respectively, for each pixel at location <math|x> of every incoming image
  frame at time step <math|n>. Computationally efficient sub-algorithms are
  selected in order to realize a real-time wildfire detection system working
  in a standard PC. The decision functions are combined in a linear manner
  and the weights are determined according to the weight update mechanism
  described in Section<nbsp><reference|sec:weight>.

  Decision functions <math|D<rsub|i>,<nbsp><nbsp>i=1,...,M> of sub-algorithms
  do not produce binary values <math|1> (correct) or <math|-1> (false), but
  they produce real numbers centered around zero for each incoming sample
  <math|x>. If the number is positive (negative), then the individual
  algorithm decides that there is (not) smoke due to forest fire in the
  viewing range of the camera. Output values of decision functions express
  the confidence level of each sub-algorithm. Higher the value, the more
  confident the algorithm.

  First four sub-algorithms are described in detail in<nbsp><cite|toretez>
  which is available online at EURASIP webpage. We recently added the fifth
  sub-algorithm to our system. It is briefly reviewed below.

  <subsection|Covariance Matrix Based Region Classification>

  The fifth sub-algorithm deals with the classification of the smoke colored
  moving regions. A region covariance matrix<nbsp><cite|porikli> consisting
  of discriminative features is calculated for each region. For each pixel in
  the region, a 9-dimensional feature vector <math|z<rsub|k>> is calculated
  as follows:

  <align|<tformat|<table|<row|<cell|z<rsub|k>=<around*|[|x<rsub|1>*<space|0.27em>x<rsub|2>*<space|0.27em>Y<around|(|x<rsub|1>,x<rsub|2>|)>*<space|0.27em>U<around|(|x<rsub|1>,x<rsub|2>|)>*<space|0.27em>V<around|(|x<rsub|1>,x<rsub|2>|)><space|1em><space|1em><space|1em>|\<nobracket\>>>>|<row|<cell|<around*|\<nobracket\>|<no-number><around*|\||<frac|d*Y<around|(|x<rsub|1>,x<rsub|2>|)>|dx<rsub|1>>|\|><space|0.27em><around*|\||<frac|d*Y<around|(|x<rsub|1>,x<rsub|2>|)>|dx<rsub|2>>|\|><space|0.27em><around*|\||<frac|d<rsup|2>*Y<around|(|x<rsub|1>,x<rsub|2>|)>|d*x<rsub|1><rsup|2>>|\|><space|0.27em><around*|\||<frac|d<rsup|2>*Y<around|(|x<rsub|1>,x<rsub|2>|)>|d*x<rsub|2><rsup|2>>|\|>|]><rsup|T>>>|<row|<cell|<no-number>>>>>>

  where <math|k> is the label of a pixel,
  <math|<around|(|x<rsub|1>,x<rsub|2>|)>> is the location of the pixel,
  <math|Y,U,V> are the components of the representation of the pixel in YUV
  color space, <math|<frac|d*Y<around|(|x<rsub|1>,x<rsub|2>|)>|d*x<rsub|1>>>
  and <math|<frac|d*Y<around|(|x<rsub|1>,x<rsub|2>|)>|d*x<rsub|2>>> are the
  horizontal and vertical derivatives of the region respectively, calculated
  using the filter [-1 0 1], <math|<frac|d<rsup|2>*Y<around|(|x<rsub|1>,x<rsub|2>|)>|dx<rsub|1><rsup|2>>>
  and <math|<frac|d<rsup|2>*Y<around|(|x<rsub|1>,x<rsub|2>|)>|dx<rsub|2><rsup|2>>>
  are the horizontal and vertical second derivatives of the region calculated
  using the filter [-1 2 -1], respectively.

  The feature vector for each pixel can be represented as follows:

  <\equation>
    z<rsub|k>=<around|[|z<rsub|k><around|(|i|)>|]><rsup|T>
  </equation>

  where, <math|z<rsub|k><around|(|i|)>> is the <math|i<rsub|t>*h> entry of
  the feature vector. This feature vector is used to calculate the 9 by 9
  covariance matrix of the regions using the fast covariance matrix
  computation formula<nbsp><cite|tuzel>:

  <\equation>
    C<rsub|R>=<around|[|c<rsub|R><around|(|i,j|)>|]>=<around*|(|<frac|1|n-1>*<around*|[|<big|sum><rsub|k=1><rsup|n>z<rsub|k><around|(|i|)>*z<rsub|k><around|(|j|)>-Z<rsub|k*k>|]>|)>
  </equation>

  where

  <\equation>
    <no-number>*Z<rsub|k*k>=<frac|1|n>*<big|sum><rsub|k=1><rsup|n>z<rsub|k><around|(|i|)>*<big|sum><rsub|k=1><rsup|n>z<rsub|k><around|(|j|)>
  </equation>

  and <math|n> is the total number of pixels in the region and
  <math|c<rsub|R><around|(|i,j|)>> is the <math|<around|(|i,j|)>> the
  component of the covariance matrix.

  The region covariance matrices are symmetric therefore we only need half of
  the elements of the matrix for classification. We also do not need the
  first 3 elements <math|c<rsub|R><around|(|1,1|)>,c<rsub|R><around|(|2,1|)>,c<rsub|R><around|(|2,2|)>>
  when using the lower diagonal elements of the matrix, because these are the
  same for all regions. Then, we need a feature vector <math|f<rsub|R>> with
  <math|9\<times\>10/2-3=42> elements for each region. For a given region the
  final feature vector does not depend on the number of pixels in the region,
  it only depends on the number of features in <math|z<rsub|k>>.

  <big-table|<label|table:svm> <vspace|0.5cm><with|par-mode|center|<tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|3|4|cell-tborder|1ln>|<cwith|1|1|1|1|cell-col-span|2>|<cwith|1|1|1|1|cell-halign|c>|<cwith|1|1|1|1|cell-lborder|0ln>|<cwith|1|1|1|1|cell-rborder|1ln>|<cwith|1|1|3|3|cell-col-span|2>|<cwith|1|1|2|2|cell-rborder|1ln>|<cwith|1|1|3|3|cell-halign|c>|<cwith|1|1|3|3|cell-rborder|1ln>|<cwith|1|1|3|4|cell-bborder|1ln>|<cwith|1|1|3|4|cell-bborder|1ln>|<cwith|2|2|1|1|cell-col-span|2>|<cwith|2|2|1|1|cell-halign|c>|<cwith|2|2|1|1|cell-lborder|0ln>|<cwith|2|2|1|1|cell-rborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|3|3|2|4|cell-bborder|1ln>|<cwith|4|4|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|>|<cell|Predicted
  Labels>|<cell|>>|<row|<cell|>|<cell|>|<cell|Not
  Smoke>|<cell|Smoke>>|<row|<cell|Actual>|<cell|Not
  Smoke>|<cell|11342/(97.2)%>|<cell|327/ (3.8%)>>|<row|<cell|Labels>|<cell|Smoke>|<cell|49/
  (0.7%)>|<cell|6962/(99.3%)>>>>>>|Confusion matrix of the training set>

  A Support Vector Machine (SVM) with RBF kernel is trained with the region
  covariance feature vectors of smoke regions in the training database. 18680
  images are used to train the SVM. 7011 of the images are positive images
  which have actual smoke and the rest are negative images that do not have
  smoke. Sample positive and negative images are shown in
  Fig.<nbsp><reference|myfigure>. The confusion matrix for the training set
  is given in Table<nbsp><reference|table:svm>. The success rate is
  <math|99.3%> for the positive images and <math|97.2%> for the negative
  images.

  <\big-figure>
    <\with|par-mode|center>
      <subfigure*|Negative training images.|<image|negs.eps|8.5cm|||> >

      <subfigure*|Positive training images|<image|poses.eps|8.5cm|||>
      ><label|myfigure>
    </with>
  </big-figure|Positive and negative images from the training set.>

  The LIBSVM<nbsp><cite|libsvm> software library is used to obtain the
  posterior class probabilities, <math|p<rsub|R>=P*r*<around|(|l*a*b*e*l=1\|f<rsub|R>|)>>,
  where <math|l*a*b*e*l=1> corresponds to a smoke region. In this software
  library, posterior class probabilities are estimated by approximating the
  posteriors with a sigmoid function as in<nbsp><cite|platt>. If the
  posterior probability is larger than <math|0.5> the label is 1 and the
  region contains smoke according to the covariance descriptor. The decision
  function for this sub-algorithm is defined as follows:

  <\equation>
    <label|eq:svm>D<rsub|5><around|(|x,n|)>=2*p<rsub|R>-1
  </equation>

  where <math|0\<less\>p<rsub|R>\<less\>1> is the estimated posterior
  probability that the region contains smoke. In<nbsp><cite|porikli>, a
  distance measure based on eigenvalues are used to compare covariance
  matrices but we found that individual covariance values also provide
  satisfactory results in this problem.

  As pointed above decision results of five sub-algorithms, <math|D<rsub|1>>,
  <math|D<rsub|2>>, <math|D<rsub|3>>, <math|D<rsub|4>> and <math|D<rsub|5>>
  are linearly combined to reach a final decision on a given pixel whether it
  is a pixel of a smoke region or not. Morphological operations are applied
  to the detected pixels to mark the smoke regions. The number of connected
  smoke pixels should be larger than a threshold to issue an alarm for the
  region. If a false alarm is issued during training phase, the oracle gives
  feedback to the algorithm by declaring a no-smoke decision value
  (<math|y=-1>) for the false alarm region. Initially, equal weights are
  assigned to each sub-algorithm. There may be large variations between
  forestal areas and substantial temporal changes may occur within the same
  forestal region. As a result weights of individual sub-algorithms will
  evolve in a dynamic manner over time.

  In real-time operating mode the PTZ cameras are in continuous scan mode
  visiting predefined preset locations. In this mode constant monitoring from
  the oracle can be relaxed by adjusting the weights for each preset once and
  then using the same weights for successive classifications. Since the main
  issue is to reduce false alarms, the weights can be updated when there is
  no smoke in the viewing range of each preset and after that the system
  becomes autonomous. The cameras stop at each preset and run the detection
  algorithm for some time before moving to the next preset. By calculating
  separate weights for each preset we are able to reduce false alarms.

  <section|Experimental Results><label|sec:experimental>

  <subsection|Experiments on wildfire detection>

  The proposed wildfire detection scheme with entropy functional based active
  learning method is implemented on a PC with an Intel Core Duo CPU 2.6GHz
  processor and tested with forest surveillance recordings captured from
  cameras mounted on top of forest watch towers near Antalya and Mugla
  provinces in Mediterranean region in Turkey. The weather is stable with
  sunny days throughout entire summer in Mediterranean. If it happens to rain
  there is no possibility of forest fire. <with|font-shape|italic|The
  installed system successfully detected three forest fires in the summer of
  2008>. The system is also independently tested by the Regional Technology
  Clearing House of San Diego State University in California in April 2009
  and it detected the test fire and did not produce any false alarms. A
  snapshot from this test is presented in Fig.<nbsp><reference|fig:san>. It
  also detected another forest fire in Cyprus in 2010.

  <big-figure|<image|sandi.eps|8.5cm|||><label|fig:san> |A snapshot from an
  independent test of the system by the Regional Technology Clearing House of
  San Diego State University in California in April 2009. The system
  successfully detected the test fire and did not produce any false alarms.
  The detected smoke regions are marked with bounding rectangles.>

  The proposed EADF strategy is compared with the, projection onto convex
  sets (POCS) based algorithm and the universal linear predictor (ULP) scheme
  proposed by Singer and Feder<nbsp><cite|singer_feder>. The ULP adaptive
  filtering method is modified to the wildfire detection problem in an online
  learning framework. In the ULP scheme, decisions of individual algorithms
  are linearly combined similar to Eq.<nbsp><reference|eq:y> as follows:

  <\equation>
    <wide|y|^><rsub|u><around|(|x,n|)>=<big|sum><rsub|i>v<rsub|i><around|(|n|)>*D<rsub|i><around|(|x,n|)>
  </equation>

  where the weights, <math|v<rsub|i><around|(|n|)>>, are updated according to
  the ULP algorithm, which assumes that the data (or decision values
  <math|D<rsub|i><around|(|x,n|)>>, in our case) are governed by some unknown
  probabilistic model <math|P><nbsp><cite|singer_feder>. The objective of a
  universal predictor is to minimize the expected cumulative loss. An
  explicit description of the weights, <math|v<rsub|i><around|(|n|)>>, of the
  ULP algorithm is given as follows:

  <\equation>
    v<rsub|i>*<around|(|n+1|)>=<frac|e*x*p(-<frac|1|2*c>*\<ell\><around|(|y<around|(|x,n|)>,D<rsub|i><around|(|x,n|)>|)>)|<big|sum><rsub|j>e*x*p(-<frac|1|2*c>*\<ell\><around|(|y<around|(|x,n|)>,D<rsub|j><around|(|x,n|)>|)>)>
  </equation>

  where <math|c> is a normalization constant and the loss function for the
  <math|i>-th decision function is:

  <\equation>
    \<ell\><around|(|y<around|(|x,n|)>,D<rsub|i><around|(|x,n|)>|)>=<around|[|y<around|(|x,n|)>-D<rsub|i><around|(|x,n|)>|]><rsup|2>
  </equation>

  The constant <math|c> is taken as <math|4> as indicated
  in<nbsp><cite|singer_feder>. The universal predictor based algorithm is
  summarized in Algorithm<nbsp><reference|fig:algo>.

  <\specified-algorithm|The pseudo-code for the universal predictor>
    \ <algo-state|Universal Predictor(x,n)>

    <\algo-for|<math|i> = 1 to M>
      <algo-state|<math|\<ell\><around|(|y<around|(|x,n|)>,D<rsub|i><around|(|x,n|)>|)>=<around|[|y<around|(|x,n|)>-D<rsub|i><around|(|x,n|)>|]><rsup|2>>>

      <algo-state|<math|v<rsub|i>*<around|(|n+1|)>=<frac|e*x*p(-<frac|1|2*c>*\<ell\><around|(|y<around|(|x,n|)>,D<rsub|i><around|(|x,n|)>|)>)|<big|sum><rsub|j>e*x*p(-<frac|1|2*c>*\<ell\><around|(|y<around|(|x,n|)>,D<rsub|j><around|(|x,n|)>|)>)>>>
    </algo-for>

    <algo-state|<math|<wide|y|^><rsub|u><around|(|x,n|)>=<big|sum><rsub|i>v<rsub|i><around|(|n|)>*D<rsub|i><around|(|x,n|)>>>

    <algo-if-else-if|<math|<wide|y|^><rsub|u><around|(|x,n|)>\<geq\>0>|<algo-state|return
    1>|<algo-state|return -1>>

    <label|fig:algo>
  </specified-algorithm>

  The POCS based scheme, the ULP based scheme, the EADF based scheme, and the
  non-adaptive approach with fixed weights are compared in the following
  experiments. In Tables<nbsp><reference|forest><nbsp>and<nbsp><reference|forest>,
  6-hour-long forest surveillance recordings containing actual forest fires
  and test fires as well as video sequences with no fires are used.

  We have 7 test fire videos ranging from 1<nbsp>km to 4<nbsp>km captured in
  Antalya and Mugla provinces in Mediterranean region in Turkey, in the
  summers of 2007 and 2008. To the best of our knowledge this is the largest
  database of forest fire clips having the initial stages of wildfires. The
  database is also used by the European Commission funded project
  FIRESENSE<nbsp><cite|firesense>. All of the above mentioned decision fusion
  methods detect forest fires within 20<nbsp>seconds, as shown in
  Table<nbsp><reference|forest>. The detection rates of the methods are
  comparable to each other. On the other hand, the proposed adaptive fusion
  strategy significantly reduces the false alarm rate of the system by
  integrating the feedback from the guard (oracle) into the decision
  mechanism within the active learning framework described in
  Section<nbsp><reference|sec:weight>. In<nbsp>Fig.<nbsp><reference|fig:false>
  a typical false alarm issued for moving tree leaves (which cause the white
  background to appear as moving smoke), by an untrained algorithm with
  decision weights equal to <math|<frac|1|5>> is shown from the clip
  <math|V*12>. The proposed algorithm does not produce a false alarm in this
  video.

  <\big-figure>
    <image|false_alarm_.eps|8.5 cm|||><label|fig:false>
  </big-figure|False alarm from clip <math|V*12>. Moving tree leaves in a
  forestal area cause a false alarm in an untrained algorithm with decision
  weights equal to <math|<frac|1|5>>(depicted as a bounding box). The
  proposed algorithm does not produce a false alarm in this video.>

  <\big-table>
    \ <vspace|0.0cm>

    <with|font-size|0.84|<assign|arraystretch|<macro|1.5>>
    <assign|tabcolsep|<macro|2pt>><tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|1ln>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|6|6|cell-rborder|1ln>|<cwith|1|-1|7|7|cell-halign|c>|<cwith|1|-1|7|7|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|4|4|cell-col-span|4>|<cwith|1|1|4|4|cell-halign|c>|<cwith|1|1|4|4|cell-rborder|1ln>|<cwith|1|1|4|7|cell-bborder|1ln>|<cwith|3|3|1|-1|cell-bborder|1ln>|<cwith|4|4|1|-1|cell-bborder|1ln>|<cwith|5|5|1|-1|cell-bborder|1ln>|<cwith|6|6|1|-1|cell-bborder|1ln>|<cwith|7|7|1|-1|cell-bborder|1ln>|<cwith|8|8|1|-1|cell-bborder|1ln>|<cwith|9|9|1|-1|cell-bborder|1ln>|<cwith|10|10|1|-1|cell-bborder|1ln>|<cwith|11|11|1|-1|cell-bborder|1ln>|<table|<row|<cell|Video>|<cell|Range>|<cell|Capture>|<cell|Frame
    number of first alarm>|<cell|>|<cell|>|<cell|>>|<row|<cell|Sequence>|<cell|(km)>|<cell|Frame
    Rate>|<cell|POCS>|<cell|Universal>|<cell|Fixed>|<cell|EADF>>|<row|<cell|>|<cell|>|<cell|(fps)>|<cell|Based>|<cell|>|<cell|Weights>|<cell|Based>>|<row|<cell|V1>|<cell|1>|<cell|7>|<cell|47>|<cell|48>|<cell|47>|<cell|42>>|<row|<cell|V2>|<cell|3>|<cell|7>|<cell|135>|<cell|142>|<cell|141>|<cell|125>>|<row|<cell|V3>|<cell|3>|<cell|7>|<cell|130>|<cell|101>|<cell|37>|<cell|134>>|<row|<cell|V4>|<cell|4>|<cell|25>|<cell|160>|<cell|154>|<cell|70>|<cell|150>>|<row|<cell|V5>|<cell|3>|<cell|9>|<cell|65>|<cell|55>|<cell|57>|<cell|56>>|<row|<cell|V6>|<cell|2>|<cell|5>|<cell|70>|<cell|74>|<cell|76>|<cell|75>>|<row|<cell|V7>|<cell|2>|<cell|5>|<cell|93>|<cell|74>|<cell|41>|<cell|83>>|<row|<cell|Average>|<cell|->|<cell|->|<cell|112.85>|<cell|92.57>|<cell|67>|<cell|97.85>>>>>
    >

    <label|forest>
  </big-table|Frame numbers at which an alarm is issued with different
  methods for wildfire smoke captured at various ranges and fps. It is
  assumed that the smoke starts at frame <math|0>.>

  <\big-table>
    \ <vspace|0.0cm>

    <with|font-size|0.84|<assign|arraystretch|<macro|1.5>>
    <assign|tabcolsep|<macro|2pt>><tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|1ln>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|6|6|cell-rborder|1ln>|<cwith|1|-1|7|7|cell-halign|c>|<cwith|1|-1|7|7|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|4|4|cell-col-span|4>|<cwith|1|1|4|4|cell-halign|c>|<cwith|1|1|4|4|cell-rborder|1ln>|<cwith|2|2|4|4|cell-col-span|4>|<cwith|2|2|4|4|cell-halign|c>|<cwith|2|2|4|4|cell-rborder|1ln>|<cwith|2|2|4|7|cell-bborder|1ln>|<cwith|4|4|1|-1|cell-bborder|1ln>|<cwith|5|5|1|-1|cell-bborder|1ln>|<cwith|6|6|1|-1|cell-bborder|1ln>|<cwith|7|7|1|-1|cell-bborder|1ln>|<cwith|8|8|1|-1|cell-bborder|1ln>|<cwith|9|9|1|-1|cell-bborder|1ln>|<cwith|10|10|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|Frame>|<cell|Video>|<cell|<multirow|2|*|Average
    Errors (<math|\<times\>10<rsup|-3>>) >>|<cell|>|<cell|>|<cell|>>|<row|<cell|Video>|<cell|Rate>|<cell|Duration>|<cell|>|<cell|>|<cell|>|<cell|>>|<row|<cell|Sequence>|<cell|(fps)>|<cell|(sec.)>|<cell|POCS>|<cell|Universal>|<cell|Fixed>|<cell|EADF>>|<row|<cell|>|<cell|>|<cell|>|<cell|Based>|<cell|>|<cell|Weights>|<cell|Based>>|<row|<cell|V8>|<cell|7>|<cell|480>|<cell|7.0076>|<cell|94.6995>|<cell|138.2102>|<cell|9.3712>>|<row|<cell|V9>|<cell|25>|<cell|300>|<cell|8.3375>|<cell|38.2390>|<cell|54.9168>|<cell|5.5494>>|<row|<cell|V10>|<cell|25>|<cell|600>|<cell|8.9892>|<cell|77.5699>|<cell|101.7512>|<cell|3.2637>>|<row|<cell|V11>|<cell|10>|<cell|900>|<cell|5.2054>|<cell|23.6602>|<cell|30.0455>|<cell|2.4314>>|<row|<cell|V12>|<cell|7>|<cell|60>|<cell|15.5350>|<cell|98.0371>|<cell|136.8163>|<cell|12.1520>>|<row|<cell|Average>|<cell|->|<cell|->|<cell|9.0149>|<cell|66.4411>|<cell|92.3480>|<cell|6.5535>>>>>
    >

    <label|forest>
  </big-table|Average squared pixel errors issued by different methods to
  video sequences without any wildfire smoke.>

  <with|par-columns|1|<big-figure|<image|errors3.eps|6.5in|||><label|fig:errors>
  |Average squared pixel errors for POCS and EADF based algorithms for the
  video seuqence <math|V*12>.>>

  The proposed method produces the lowest average error in our data set. A
  set of video clips containing moving cloud shadows and other moving regions
  that usually cause false alarms is used to generate
  Table<nbsp><reference|forest>. These video clips are especially selected.
  The table shows the average pixel classification error for each method. The
  average pixel error for a video sequence <math|v> is calculated as follows:

  <\equation>
    <wide|E|\<bar\>><around|(|v|)>=<frac|1|F<rsub|I>>*<big|sum><rsub|n=1><rsup|F<rsub|I>><around|(|<frac|e<rsub|n>|N<rsub|I>>|)>
  </equation>

  where <math|N<rsub|I>> is the total number of pixels in the image frame,
  <math|F<rsub|I>> is the number of frames in the video sequence,
  <math|e<rsub|n>> is the sum of the squared errors for each classified pixel
  in image frame <math|n>. Except for one video sequence EADF based method
  has the lowest pixel classification error.

  In Fig.<nbsp><reference|fig:errors>, the squared pixels errors of POCS and
  EADF based schemes are compared for the video clip <math|V*12>. The weights
  are updated until 125th frame for both algorithms. The POCS based algorithm
  has an initial stage until 30th frame where the error gradually drops to
  zero, whereas EADF algorithm converges after only 2 frames. The tracking
  performance of the EADF algorithm is also better than the POCS based
  algorithm which can be observed after the frame number 180 at which some of
  the sub-algorithms issue false alarms.

  The software is currently being used in <math|59> forest watch towers in
  Turkey.

  <subsection|Experiments on a UCI Dataset>

  The proposed method is also tested with a dataset from UCI (University of
  California, Irvine) machine learning repository to evaluate the performance
  of the algorithm in combining different classifiers. In the wildfire
  detection case the image data arrives sequentially and the decision weights
  are updated in real-time. On the other hand the UCI data sets are fixed.
  Therefore the dataset is divided into two parts. The first part is used for
  training.

  During the training phase, weights of different classifiers are determined
  using the EADF update method. In testing stage the fixed weights obtained
  from the training stage are used to combine the classifier decisions which
  process the data in a sequential manner because both the POCS and EADF
  frameworks assume that the new data arrive in a sequential manner.

  The test is performed on the ionosphere data from UCI machine learning
  repository that consists of radar measurements to detect the existence of
  free electrons that form a structure in the atmosphere. The electrons that
  show some kind of structure in the ionosphere return \PGood\Q responses,
  the others return \PBad\Q responses. There are 351 samples with 34-element
  feature vectors that are obtained by passing the radar signals through an
  autocorrelation function. In<nbsp><cite|ion>, the first 200 samples are
  used as training set to classify the remaining 151 test samples. They
  obtained % 90.7 accuracy with a linear perceptron, % 92 accuracy with a
  non-linear perceptron, and % 96 accuracy with a back propagation neural
  network.

  For this test SVM, k-nn (k-Nearest Neighbor) and NCC (normalized
  cross-correlation) classifiers are used. Also, in this classification the
  decision functions of these classifiers produce binary values with 1
  corresponding to \PGood\Q classification and -1 corresponding to \PBad\Q
  classification rather than scaled posterior probabilities in the range
  <math|[-1,1]>.

  The accuracies of the sub-algorithms and EADF are shown in
  Table<nbsp><reference|iones>. The success rates of the proposed EADF and
  POCS methods are both % 98.01 which is higher than all the sub-algorithms.
  Both the entropic projection and orthogonal projection based algorithms
  converge to a solution in the intersection of the convex sets. It turns out
  that they both converge to the same solution in this particular case. This
  is possible when the intersection set of convex sets is small. The proposed
  EADF method is actually developed for real-time application in which data
  arrives sequentially. This example is included to show that the EADF scheme
  can be also used in other datasets. It may be possible to get better
  classification results with other classifiers in this fixed UCI dataset.

  <\big-table>
    \;

    <with|font-size|0.84|<assign|arraystretch|<macro|1.5>>
    <assign|tabcolsep|<macro|2pt>><tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|1ln>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|6|6|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|1|cell-row-span|2>|<cwith|1|1|1|1|cell-valign|c>|<cwith|1|1|2|2|cell-col-span|5>|<cwith|1|1|2|2|cell-halign|c>|<cwith|1|1|2|2|cell-rborder|1ln>|<cwith|1|1|2|6|cell-bborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|3|3|1|-1|cell-bborder|1ln>|<cwith|4|4|1|-1|cell-bborder|1ln>|<table|<row|<cell|Data>|<cell|Success
    Rates (%)>|<cell|>|<cell|>|<cell|>|<cell|>>|<row|<cell|>|<cell|SVM>|<cell|k-nn
    (k=4)>|<cell|NCC>|<cell|POCS>|<cell|EADF>>|<row|<cell|Train>|<cell|100.0>|<cell|91.50>|<cell|100.0>|<cell|100.0>|<cell|100.0>>|<row|<cell|Test>|<cell|94.03>|<cell|97.35>|<cell|91.39>|<cell|98.01>|<cell|98.01>>>>>
    >

    <label|iones>
  </big-table|Accuracies of sub-algorithms and EADF on ionosphere dataset.>

  <section|Conclusion><label|sec:conclusion>

  An entropy functional based online adaptive decision fusion (EADF) is
  proposed for image analysis and computer vision applications with drifting
  concepts. In this framework, it is assumed that the main algorithm for a
  specific application is composed of several sub-algorithms each of which
  yielding its own decision as a real number centered around zero
  representing its confidence level. Decision values are linearly combined
  with weights which are updated online by performing non-orthogonal
  e-projections onto convex sets describing sub-algorithms. This general
  framework is applied to a real computer vision problem of wildfire
  detection. The proposed adaptive decision fusion strategy takes into
  account the feedback from guards of forest watch towers. Experimental
  results show that the learning duration is decreased with the proposed
  online adaptive fusion scheme. It is also observed that error rate of the
  proposed method is the lowest in our data set, compared to universal linear
  predictor (ULP) and the projection onto convex sets (POCS) based schemes.

  The proposed framework for decision fusion is suitable for problems with
  concept drift. At each stage of the algorithm, the method tracks the
  changes in the nature of the problem by performing an non-orthogonal
  e-projection onto a hyperplane describing the decision of the oracle.

  <section*|Acknowledgment>

  This work was supported in part by the Scientific and Technical Research
  Council of Turkey, TUBITAK, with grant no. 106G126 and 105E191, in part by
  European Commission 6th Framework Program with grant number FP6-507752
  (MUSCLE Network of Excellence Project) and in part by FIRESENSE (Fire
  Detection and Management through a Multi-Sensor Network for the Protection
  of Cultural Heritage Areas from the Risk of Fire and Extreme Weather
  Conditions, FP7-ENV-2009-1244088-FIRESENSE) .

  <\bibliography|bib|IEEEbib|refer>
    <bib-list|[99]|>
  </bibliography>
</body>