<TeXmacs|1.0.7>

<style|<tuple|acmconf|vdh>>

<\body>
  <doc-data|<doc-title|Relaxed multiplication using the middle
  product>|<doc-author-data|<author-name|Joris van der
  Hoeven>|<\author-address>
    <abbr|Dpt.> de Mathmatiques (<abbr|Bt.> 425)

    Universit Paris-Sud

    91405 Orsay Cedex

    France

    Email: <verbatim|joris@texmacs.org>
  </author-address>>|<doc-date|<date|%B %d, %Y>>>

  <\abstract>
    In previous work, we have introduced the technique of relaxed power
    series computations. With this technique, it is possible to solve
    implicit equations almost as quickly as doing the operations which occur
    in the implicit equation. In this paper, we present a new relaxed
    multiplication algorithm for the resolution of linear equations. The
    algorithm has the same asymptotic time complexity as our previous
    algorithms, but we improve the space overhead in the divide and conquer
    model and the constant factor in the <abbr|F.F.T.> model.
  </abstract>

  <assign|fun|<macro|f|<with|mode|text|<with|font-family|ss|<arg|f>>>>><assign|type|<macro|f|<with|mode|text|<with|font-family|ss|<arg|f>>>>><with|mode|math|<assign|C|\<cal-C\>><assign|R|\<cal-R\>><assign|K|\<cal-K\>>><section|Introduction>

  Let <with|mode|math|\<cal-R\>> be an effective ring and consider two power
  series <with|mode|math|f=f<rsub|0>+f<rsub|1>*z+\<cdots\>> and
  <with|mode|math|g=g<rsub|0>+g<rsub|1>*z+\<cdots\>> in
  <no-break><with|mode|math|\<cal-R\>[[z]]>. In this paper we will be
  concerned with the efficient computation of the first <with|mode|math|n>
  coefficients of the product <with|mode|math|h=f*g=h<rsub|0>+h<rsub|1>*z+\<cdots\>>.

  If the first <with|mode|math|n> coefficients of <with|mode|math|f> and
  <with|mode|math|g> are known beforehand, then we may use any fast
  multiplication for polynomials in order to achieve this goal, such as
  divide and conquer multiplication <cite|Kar63|Kn97>, which has a time
  complexity <with|mode|math|K(n)=O(n<rsup|log 3/log 2>)>, or <abbr|F.F.T.>
  multiplication <cite|CT65|SS71|CK91|vdH:relax>, which has a time complexity
  <with|mode|math|M(n)=O(n*log <no-break>n*log <no-break>log n)>.

  For simplicity, ``time complexity'' stands for the required number of
  operations in <with|mode|math|\<cal-R\>>. Similarly, ``space complexity''
  will stand for the number of elements of <with|mode|math|\<cal-R\>> which
  need to be stored. The required number of multiplications
  <with|mode|math|K<rsub|\<times\>>(n)> in the divide and conquer algorithm
  satisfies the following recurrence relations:

  <\eqnarray*>
    <tformat|<table|<row|<cell|K<rsub|\<times\>>(1)>|<cell|=>|<cell|1>>|<row|<cell|K<rsub|\<times\>>(n)>|<cell|=>|<cell|2*K<rsub|\<times\>>(\<lceil\>n/2\<rceil\>)+K<rsub|\<times\>>(\<lfloor\>n/2\<rfloor\>)>>>>
  </eqnarray*>

  When performing computing only the product truncated at order
  <with|mode|math|n>, then the number of multiplications
  <with|mode|math|K<rsub|\<times\>><rsup|\<ast\>>> needed by the divide and
  conquer algorithm becomes

  <\eqnarray*>
    <tformat|<table|<row|<cell|K<rsub|\<times\>><rsup|\<ast\>>(1)>|<cell|=>|<cell|1>>|<row|<cell|K<rsub|\<times\>><rsup|\<ast\>>(n)>|<cell|=>|<cell|K<rsub|\<times\>>(\<lceil\>n/2\<rceil\>)+2*K<rsub|\<times\>><rsup|\<ast\>>(\<lfloor\>n/2\<rfloor\>)>>>>
  </eqnarray*>

  <yes-indent>For certain computations, and most importantly the resolution
  of implicit equations, it is interesting to have so called ``relaxed
  algorithms'' which output the first <with|mode|math|i> coefficients of
  <with|mode|math|h> as soon as the first <with|mode|math|i> coefficients of
  <with|mode|math|f> and <with|mode|math|g> are known for each
  <with|mode|math|i\<leqslant\><no-break>n>. This allows for instance the
  computation of the exponential <with|mode|math|g=exp f> of a series
  <with|mode|math|f> with <with|mode|math|f<rsub|0>=0> using the formula

  <\equation>
    <label|exp-form>g=<big|int>f<rprime|'>*g.
  </equation>

  In <cite|vdH:issac97|vdH:relax>, we proved the following two theorems:

  <\theorem>
    <label|dc-th>There exists a relaxed multiplication algorithm of time
    complexity <with|mode|math|K(n)> and space complexity
    <with|mode|math|O(log n)>, and which uses
    <with|mode|math|K<rsub|\<times\>>(n)> multiplications.
  </theorem>

  <\theorem>
    There exists a relaxed multiplication algorithm of time complexity
    <with|mode|math|O(M(n)*log n)> and space complexity
    <with|mode|math|O(n)>.
  </theorem>

  Although these theorems are satisfactory from a theoretical point of view,
  they can be improved in two directions: by removing the logarithmic space
  overhead in the divide and conquer model and by improving the constant
  factor in the <abbr|F.F.T.> model.

  In this paper, we will present such an improved algorithm in the case of
  relaxed multiplication with a fixed series. More precisely, let
  <with|mode|math|f> and <with|mode|math|g> be power series, such that
  <with|mode|math|g> is known up to order <with|mode|math|n>. Then our
  algorithm will compute the product <with|mode|math|h=f*g> up to order
  <with|mode|math|n> and output <with|mode|math|(f*g)<rsub|i>> as soon as
  <with|mode|math|f<rsub|0>,\<ldots\>,f<rsub|i>> are known, for all
  <no-break><with|mode|math|i\<less\>n>. We will prove the following:

  <\theorem>
    <label|new-dc-th>There exists a relaxed multiplication algorithm with
    fixed series of time complexity <no-break><with|mode|math|O(K(n))>, of
    space complexity <with|mode|math|O(n)>, and which uses
    <with|mode|math|K<rsub|\<times\>><rsup|\<ast\>>(n)> multiplications.
  </theorem>

  We also obtain a better constant factor in the asymptotic complexity in the
  <abbr|F.F.T.> model, but this result is harder to state in a precise
  theorem.

  The algorithm is useful for the relaxed resolution of linear differential
  or difference equations. For instance, the exponential of a series can be
  computed using <with|mode|math|<op|\<leqslant\>>K<rsub|\<times\>><rsup|\<ast\>>(n)>
  multiplications in <with|mode|math|\<cal-R\>>. Moreover, the new algorithm
  is very simple to implement, so it is likely to require less overhead than
  the algorithm from theorem <reference|dc-th>.

  Our algorithm is based on the recent middle product algorithm
  <no-break><cite|HaQuZi00|HaQuZi02>, which is recalled in section
  <no-break><reference|mid-prod>. In section <reference|main-alg> we present
  our new algorithm and in section <reference|appls> we give some
  applications.

  In our algorithms we will use the following notations: the data type
  <with|mode|math|<type|TPS>(n)> stands for truncated power series of order
  <with|mode|math|n>, like <with|mode|math|f=f<rsub|0>+\<cdots\>+f<rsub|n-1>*z<rsup|n-1>>.
  Given <with|mode|math|f\<in\><type|TPS>(n)> and
  <with|mode|math|0\<leqslant\>i\<less\>j\<leqslant\>n>, we will denote
  <with|mode|math|f<rsub|i\<ldots\>j>=f<rsub|i>+\<cdots\>+f<rsub|j-1>*z<rsup|j-i-1>\<in\><type|TPS>(j-i)>.
  Given <with|mode|math|f\<in\><type|TPS>(m)> and
  <with|mode|math|g\<in\><type|TPS>(n)>, we also denote
  <with|mode|math|f\<join\>g=f+g*z<rsup|m>\<in\><type|TPS>(m+n)>. We will
  denote by <with|mode|math|<type|Ref>(<type|TPS>(n))> the type, whose
  elements are references to elements of type <with|mode|math|<type|TPS>(n)>.
  If <with|mode|math|f\<in\><type|TPS>(n)> and
  <with|mode|math|0\<leqslant\>i\<less\>j\<leqslant\>n>, then we assume that
  <with|mode|math|f<rsub|i\<ldots\>j>\<in\><type|Ref>(<type|TPS>(n))>.

  <section|The middle product><label|mid-prod>

  Let <with|mode|math|f=f<rsub|0>+\<cdots\>+f<rsub|n-1>*z<rsup|n-1>> and
  <with|mode|math|g=g<rsub|0>+\<cdots\>+g<rsub|2*n-2>*z<rsup|2*n-2>> be two
  truncated power series at orders <with|mode|math|n> <abbr|resp.>
  <with|mode|math|2*n-1>. The <em|middle product> <with|mode|math|f> and
  <with|mode|math|g> is defined to be the truncated power series
  <with|mode|math|h=f\<ast\>g=h<rsub|0>+\<cdots\>+h<rsub|n-1>*z<rsup|n-1>> of
  order <with|mode|math|n>, such that <with|mode|math|h<rsub|i>=<big|sum><rsub|j=0><rsup|n-1>f<rsub|j>*g<rsub|n-1+i-j>>
  for all <with|mode|math|i\<in\>{0,\<ldots\>,n-1}>. In figure
  <reference|midprod>, <with|mode|math|h> corresponds to the colored region.

  \;

  <big-figure|<postscript|midprod.ps|*.4|*.4||||>|<label|midprod>Illustration
  of the middle product.>

  The middle product of <with|mode|math|f=f<rsub|0>+f<rsub|1>*z> and
  <with|mode|math|g=g<rsub|0>+g<rsub|1>*z+g<rsub|2>*z<rsup|2>> can be
  computed using only three multiplications, using the following trick:

  <\eqnarray*>
    <tformat|<table|<row|<cell|\<alpha\>>|<cell|=>|<cell|f<rsub|1>*(g<rsub|0>+g<rsub|1>)>>|<row|<cell|\<beta\>>|<cell|=>|<cell|(f<rsub|1>-f<rsub|0>)*g<rsub|1>>>|<row|<cell|\<gamma\>>|<cell|=>|<cell|f<rsub|0>*(g<rsub|1>+g<rsub|2>)>>|<row|<cell|h<rsub|0>>|<cell|=>|<cell|\<alpha\>-\<beta\>>>|<row|<cell|h<rsub|1>>|<cell|=>|<cell|\<gamma\>+\<beta\>>>>>
  </eqnarray*>

  This trick may be applied recursively in order to yield an algorithm which
  needs exactly the same number of multiplications
  <with|mode|math|K<rsub|\<times\>>(n)> as the divide and conquer algorithm
  for the computation of the product of two polynomials of degree
  <with|mode|math|n-1>. More precisely, the following recursive algorithm
  comes from <no-break><cite|HaQuZi00|HaQuZi02>.

  <\algorithm|>
    <with|mode|math|f\<ast\>g><no-page-break>

    <item*|Input><with|mode|math|f\<in\><type|TPS>(n)> and
    <with|mode|math|g\<in\><type|TPS>(2*n-1)><no-page-break>

    <item*|Output>their middle product <with|mode|math|f\<ast\>g\<in\><type|TPS>(n)><no-page-break>

    <\with|par-par-sep|0cm>
      <\body>
        <strong|if> <with|mode|math|n=1> <strong|then> <strong|return>
        <with|mode|math|f<rsub|0>*g<rsub|0>><no-page-break>

        <\with|mode|math>
          k\<assign\>\<lfloor\>n/2\<rfloor\>,l\<assign\>\<lceil\>n/2\<rceil\><no-page-break>
        </with>

        <with|mode|math|\<alpha\>\<assign\>f<rsub|k\<ldots\>n>\<ast\>(g<rsub|0\<ldots\>2*l-1>+g<rsub|l\<ldots\>3*l-1>)><no-page-break>

        <strong|if> <with|mode|math|n> is even<no-page-break>

        <\indent>
          <strong|then> <with|mode|math|\<beta\>\<assign\>(f<rsub|l\<ldots\>n>-f<rsub|0\<ldots\>k>)\<ast\>g<rsub|l\<ldots\>3*l-1>><no-page-break>

          <strong|else> <with|mode|math|\<beta\>\<assign\>[f<rsub|k>\<join\>(f<rsub|l\<ldots\>n>-f<rsub|0\<ldots\>k>)]\<ast\>g<rsub|l\<ldots\>3*l-1>><no-page-break>
        </indent>

        <with|mode|math|\<gamma\>\<assign\>f<rsub|0\<ldots\>k>\<ast\>(g<rsub|l\<ldots\>l+2*k-1>+g<rsub|2*l\<ldots\>2*n-1>)><no-page-break>

        <strong|return> <with|mode|math|(\<alpha\>-\<beta\>)\<join\>(\<gamma\>+\<beta\><rsub|0\<ldots\>k>)>
      </body>
    </with>
  </algorithm>

  In <cite|HaQuZi02> it is also shown that, in the <abbr|F.F.T.> model, the
  middle product can still be computed in essentially the same time as the
  product of two polynomials.

  <section|Relaxed multiplication with a fixed series><label|main-alg>

  Let <with|mode|math|f> and <with|mode|math|g> be power series, such that
  <with|mode|math|g> is known up to order <with|mode|math|n>. In this
  section, we present an algorithm which computes the product
  <with|mode|math|h=f*g> up to order <with|mode|math|n>. For each
  <with|mode|math|i\<less\>n>, the algorithm outputs
  <with|mode|math|(f*g)<rsub|i>> as soon as
  <with|mode|math|f<rsub|0>,\<ldots\>,f<rsub|i>> are known.

  The idea of our algorithm is similar to the idea behind fast relaxed
  multiplication in <cite|vdH:issac97|vdH:relax> and based on a subdivision
  of the triangular area which corresponds to the computation of the
  truncated power series product. This subdivision is shown in figures
  <no-break><reference|midrelax> and <reference|midex-fig>, where each
  parallelogram corresponds to the computation of a middle product.

  <big-figure|<postscript|midrelax.ps|*.4|*.4||||>|<label|midrelax>The
  subdivision used for the new relaxed multiplication algorithm.>

  More precisely, let <with|mode|math|l=\<lceil\>n/2\<rceil\>> and assume
  that <with|mode|math|f<rsub|0>,\<ldots\>,f<rsub|l-1>> are known. Then the
  contribution of <with|mode|math|f<rsub|0\<ldots\>l>\<ast\>g<rsub|n+1-2l\<ldots\>n>>
  to <with|mode|math|f*g> may be computed using the middle product algorithm
  from the previous section. The relaxed truncated products
  <with|mode|math|f<rsub|0\<ldots\>k>*g<rsub|0\<ldots\>k>> and
  <with|mode|math|f<rsub|l\<ldots\>n>*g<rsub|0\<ldots\>k>> may be computed
  recursively.

  In order to implement this idea, we will use an in-place algorithm, which
  adds the result of <with|mode|math|h=f*g> to a reference
  <with|mode|math|\<varphi\>> to an element of
  <with|mode|math|<type|TPS>(n)>. Denote by
  <with|mode|math|\<varphi\><rsub|init>> the initial value of
  <with|mode|math|\<varphi\>>. Then the in-place algorithm should be called
  successively for <with|mode|math|i=1,\<ldots\>,<no-break>n>. After the last
  call, we have <with|mode|math|\<varphi\>=\<varphi\><rsub|init>+<no-break>h>.
  Taking <with|mode|math|\<varphi\><rsub|init>=0>, the algorithm computes
  <no-break><with|mode|math|h>.

  <\algorithm|>
    <with|mode|math|<fun|relaxed-muladd>(f,g,\<varphi\>,i)><no-page-break>

    <item*|Input><with|mode|math|f,g\<in\><type|TPS>(n)>,
    <with|mode|math|\<varphi\>\<in\><type|Ref>(<type|TPS>(n))>,
    <with|mode|math|i\<leqslant\>n>.<no-page-break>

    <item*|Action>we have <with|mode|math|\<varphi\><rsub|0\<ldots\>i>=\<varphi\><rsub|init>+f<rsub|0\<ldots\>i>*g<rsub|0\<ldots\>i>>
    on exit.<no-page-break>

    <\with|par-par-sep|0cm>
      <\body>
        <strong|if> <with|mode|math|i=n=1> <strong|then>
        <with|mode|math|\<varphi\><rsub|0>\<plusassign\>f<rsub|0>*g<rsub|0>>
        and <strong|return>

        <\with|mode|math>
          k\<assign\>\<lfloor\>n/2\<rfloor\>,l\<assign\>\<lceil\>n/2\<rceil\>
        </with>

        <strong|if> <with|mode|math|i\<leqslant\>k> <strong|then>
        <with|mode|math|<fun|relaxed-muladd>(f<rsub|0\<ldots\>k>,g<rsub|0\<ldots\>k>,\<varphi\><rsub|0\<ldots\>k>,i)>

        <strong|if> <with|mode|math|i=k+1> <strong|then>
        <with|mode|math|\<varphi\><rsub|k\<ldots\>n>\<plusassign\>f<rsub|0\<ldots\>l>\<ast\>g<rsub|n+1-2l\<ldots\>n>>

        <strong|if> <with|mode|math|i\<gtr\>l> <strong|then>
        <with|mode|math|<fun|relaxed-muladd>(f<rsub|l\<ldots\>n>,g<rsub|0\<ldots\>k>,\<varphi\><rsub|l\<ldots\>n>,i-l)>
      </body>
    </with>
  </algorithm>

  The number of multiplications <with|mode|math|R<rsub|\<times\>>(n)> used by
  <with|mode|math|<fun|relaxed-muladd>> is determined by the relations

  <\eqnarray*>
    <tformat|<table|<row|<cell|R<rsub|\<times\>>(1)>|<cell|=>|<cell|1<space|0.6spc>;>>|<row|<cell|R<rsub|\<times\>>(n)>|<cell|=>|<cell|K<rsub|\<times\>>(\<lceil\>n/2\<rceil\>)+2*R<rsub|\<times\>>(\<lfloor\>n/2\<rfloor\>).>>>>
  </eqnarray*>

  By induction, it follows that <with|mode|math|R<rsub|\<times\>>(n)=K<rsub|\<times\>><rsup|\<ast\>>(n)>.
  The overall time complexity satisfies

  <\equation*>
    R(n)\<leqslant\>K(\<lceil\>n/2\<rceil\>)+2*R(\<lfloor\>n/2\<rfloor\>)+O(n),
  </equation*>

  so <with|mode|math|R(n)=O(K(n))>. The algorithm being in-place, its space
  complexity is clearly <with|mode|math|O(n)>. This proves theorem
  <reference|new-dc-th>.

  In it is also interesting to use the above algorithm in the <abbr|F.F.T.>
  model. We then have the estimation

  <\equation*>
    R(n)\<leqslant\>M(\<lceil\>n/2\<rceil\>)+2*R(\<lfloor\>n/2\<rfloor\>)+O(n)
  </equation*>

  for the asymptotic complexity <with|mode|math|R(n)>. If
  <with|mode|math|M(n)\<sim\>c*n*log <no-break>n*log log n>, this yields

  <\equation*>
    R(n)\<sim\><frac|1|2>*M(n)*log<rsub|2> n.
  </equation*>

  This should be compared with the complexity
  <with|mode|math|R(n)\<sim\>M(n)*log<rsub|2> n> of the previously best
  algorithm and with the complexity <with|mode|math|L(n)\<sim\>2*M(n)*log<rsub|2>
  n> of the standard fast relaxed multiplication algorithm.

  Notice that we rarely obtain the complexity
  <with|mode|math|M(n)\<sim\>c*n*log <no-break>n*log log n> in practice. In
  the range where <with|mode|math|M(n)\<sim\>c*n<rsup|\<alpha\>>>, we obtain

  <\equation*>
    R(n)\<sim\><frac|1|2<rsup|\<alpha\>>-2>*M(n).
  </equation*>

  <section|A worked example>

  Let us consider the computation of <with|mode|math|f=\<mathe\><rsup|z/(1-z)>>
  up till order <with|mode|math|7=n+1> using our algorithm and the formula

  <\equation*>
    f=<big|int>f*g,
  </equation*>

  with <with|mode|math|f<rsub|0>=1> and <with|mode|math|g=1+2*z+3*z<rsup|2>+\<cdots\>>.
  We start with <with|mode|math|\<varphi\>=0> in
  <with|mode|math|<fun|relaxed-muladd>> and perform the following
  computations at successive calls for <with|mode|math|i=1,\<ldots\>,6>:

  <\enumerate>
    <item>We set <with|mode|math|\<varphi\><rsub|0>\<plusassign\>f<rsub|0>*g<rsub|0>=1>,
    so that

    <\equation*>
      \<varphi\>\<assign\>1
    </equation*>

    and <with|mode|math|f<rsub|1>=1>.

    <item>We recursively apply <with|mode|math|<fun|relaxed-muladd>> to
    <with|mode|math|f<rsub|0\<ldots\>3>>,
    <with|mode|math|g<rsub|0\<ldots\>3>>,
    <with|mode|math|\<varphi\><rsub|1\<ldots\>3>> and <with|mode|math|i=2>.
    This requires the computation of <with|mode|math|f<rsub|0\<ldots\>2>\<ast\>g<rsub|0\<ldots\>3>=(1+z)\<ast\>(1+2*z+3*z<rsup|2>)=3+5*z>.
    We thus increase <with|mode|math|\<varphi\><rsub|1\<ldots\>3>\<plusassign\>3+5*z>,
    so that

    <\equation*>
      \<varphi\>\<assign\>1+3*z+5*z<rsup|2>
    </equation*>

    <no-page-break*>and <with|mode|math|f<rsub|2>=<frac|3|2>>.

    <item>The two nested recursive calls to
    <with|mode|math|<fun|relaxed-muladd>> now lead to the increase of
    <with|mode|math|\<varphi\><rsub|2>> by
    <with|mode|math|f<rsub|2>*g<rsub|0>=<frac|3|2>>, so that

    <\equation*>
      \<varphi\>\<assign\>1+3*z+<with|math-display|false|<frac|13|2>>*z<rsup|2>
    </equation*>

    and <with|mode|math|f<rsub|3>=<frac|13|6>>.

    <item>We now both have <with|mode|math|i=k+1=4> and
    <with|mode|math|i\<gtr\>l=3>. So we first compute
    <with|mode|math|f<rsub|0\<ldots\>3>\<ast\>g<rsub|1\<ldots\>6>=10+<frac|27|2>*z+17*z<rsup|2>>
    and set <with|mode|math|\<varphi\><rsub|3\<ldots\>6>\<plusassign\>10+<frac|27|2>*z+17*z<rsup|2>>.
    We next recursively apply <with|mode|math|<fun|relaxed-muladd>> to
    <with|mode|math|f<rsub|3\<ldots\>6>>,
    <with|mode|math|g<rsub|0\<ldots\>3>>,
    <with|mode|math|\<varphi\><rsub|3\<ldots\>6>> and <with|mode|math|i=1>,
    which leads to an increase of <with|mode|math|\<varphi\><rsub|3>> by
    <with|mode|math|f<rsub|3>*g<rsub|0>=<frac|13|6>>. Alltogether, we obtain

    <\equation*>
      \<varphi\>\<assign\>1+3*z+<with|math-display|false|<frac|13|2>>*z<rsup|2>+<with|math-display|false|<frac|73|6>>*z<rsup|3>+<with|math-display|false|<frac|27|2>>*z<rsup|4>+17*z<rsup|5>
    </equation*>

    and <with|mode|math|f<rsub|4>=<frac|73|24>>.

    <item>We recursively apply <with|mode|math|<fun|relaxed-muladd>> to
    <with|mode|math|f<rsub|3\<ldots\>6>>,
    <with|mode|math|g<rsub|0\<ldots\>3>>,
    <with|mode|math|\<varphi\><rsub|3\<ldots\>6>> and <with|mode|math|i=2>.
    This leads to the increase <with|mode|math|\<varphi\><rsub|4\<ldots\>6>\<plusassign\>f<rsub|3\<ldots\>5>\<ast\>g<rsub|0\<ldots\>3>=<frac|59|8>+<frac|151|12>*z>,
    so that

    <\equation*>
      \<varphi\>\<assign\>1+3*z+<with|math-display|false|<frac|13|2>>*z<rsup|2>+<with|math-display|false|<frac|73|6>>*z<rsup|3>+<with|math-display|false|<frac|167|8>>*z<rsup|4>+<with|math-display|false|<frac|355|12>>*z<rsup|5>
    </equation*>

    and <with|mode|math|f<rsub|5>=<frac|167|40>>.

    <item>The two nested recursive calls lead to the increase
    <with|mode|math|\<varphi\><rsub|5>\<plusassign\>f<rsub|5>*g<rsub|0>=<frac|167|40>>,
    so that

    <\equation*>
      \<varphi\>\<assign\>1+3*z+<with|math-display|false|<frac|13|2>>*z<rsup|2>+<with|math-display|false|<frac|73|6>>*z<rsup|3>+<with|math-display|false|<frac|167|8>>*z<rsup|4>+<with|math-display|false|<frac|4051|120>>*z<rsup|5>
    </equation*>

    and <with|mode|math|f<rsub|6>=<frac|4051|720>>.
  </enumerate>

  The entire computation is represented schematically in figure
  <reference|midex-fig>.

  <big-figure|<postscript|midexample.ps|*.4|*.4||||>|<label|midex-fig>Illustration
  of an order <with|mode|math|6> relaxed multiplication.>

  <section|Applications><label|appls>

  First of all, let us consider the problem of relaxed division by a fixed
  power series. In other words, we are given two power series
  <with|mode|math|f> and <with|mode|math|g>, where <with|mode|math|g> is
  known up to order <with|mode|math|n> and <with|mode|math|g<rsub|0>=1>. We
  want an algorithm for the computation of <with|mode|math|h=f/g> up to order
  <no-break><with|mode|math|n>, such that <with|mode|math|h<rsub|i>> is
  computed as soon as <with|mode|math|f<rsub|0>,\<ldots\>,<no-break>f<rsub|i>>
  are known for each <with|mode|math|i\<less\>n>. Now we have

  <\equation*>
    h=f-z*(\<varphi\>*h),
  </equation*>

  where <with|mode|math|\<varphi\>=(g-1)/z\<in\>\<cal-R\>[[z]]>. We may thus
  compute <no-break><with|mode|math|h> in a relaxed way using the algorithm
  from the previous section. Computing <with|mode|math|h> up till
  <with|mode|math|n> terms will then necessitate
  <with|mode|math|<op|\<leqslant\>K<rsub|\<times\>>(n)>> multiplications in
  <with|mode|math|\<cal-R\>>.

  Let us next consider a linear differential equation

  <\equation>
    <label|lin-diff-eq>L<rsub|r>*f<rsup|(r)>+\<cdots\>+L<rsub|0>*f=0,
  </equation>

  with <with|mode|math|L<rsub|0>,\<ldots\>,L<rsub|r>\<in\>\<cal-R\>[[z]]> and
  <with|mode|math|L<rsub|r>(0)=1>. Given initial conditions for
  <with|mode|math|f<rsub|0>,\<ldots\>,f<rsub|r-1>>, there exists a unique
  solution to this equation. We may compute this solution using the relaxed
  algorithm from the previous section, the above algorithm for relaxed
  division, and the formula

  <\equation*>
    f=L<rsub|r><rsup|-1>*<big|int><above|\<ldots\>|r\<times\>><big|int>(L<rsub|r>*f<rsup|(r)>+\<cdots\>+L<rsub|0>*f).
  </equation*>

  In order to compute <with|mode|math|n> coefficients, we need to perform
  <with|mode|math|(r+1)*K<rsub|\<times\>>(n)> multiplications in
  <with|mode|math|\<cal-R\>> and <with|mode|math|O(n)> multiplications and
  divisions by integers. If <with|mode|math|L<rsub|r>=1>, then we only need
  <with|mode|math|r*K<rsub|\<times\>>(n)> multiplications.

  For instance, the exponential <with|mode|math|g> of a series
  <with|mode|math|f> with <with|mode|math|f<rsub|0>=0> satisfies the equation

  <\equation*>
    g<rprime|'>-f<rprime|'>*g=0,
  </equation*>

  so <with|mode|math|g> can be computed using
  <with|mode|math|K<rsub|\<times\>>(n)> multiplications, using the formula
  (<reference|exp-form>).

  More generally, consider the solution to (<reference|lin-diff-eq>) with the
  prescribed initial conditions, and let <with|mode|math|g> be another series
  with <with|mode|math|g<rsub|0>=0>. Then the composition
  <with|mode|math|h=f\<circ\>g> again satisfies a linear differential
  equation. Indeed, we have the relations

  <\eqnarray*>
    <tformat|<table|<row|<cell|f\<circ\>g>|<cell|=>|<cell|h>>|<row|<cell|f<rprime|'>\<circ\>g>|<cell|=>|<cell|<frac|h|g<rprime|'>>>>|<row|<cell|f<rprime|''>\<circ\>g>|<cell|=>|<cell|<frac|h<rprime|''>|g<rprime|'><rsup|2>>-<frac|h<rprime|'>*g<rprime|''>|g<rprime|'><rsup|3>>>>|<row|<cell|>|<cell|\<vdots\>>|<cell|>>>>
  </eqnarray*>

  Postcomposing (<reference|lin-diff-eq>) with <with|mode|math|g> and using
  these relations, we obtain a linear differential equation for
  <with|mode|math|h>.

  In fact, our algorithm may be used to solve far more general linear
  equations, such as linear partial differential equations, or linear
  differential-difference equations. In the case of difference equations, we
  notice that the relaxed multiplications in the algorithms from
  <cite|vdH:relax> for relaxed right composition with a fixed series all have
  one fixed argument. So we may indeed apply the algorithm from section
  <no-break><reference|main-alg>.

  We finally notice that our algorithm can even be used in a non-linear
  context. Indeed, after computing <with|mode|math|\<lceil\>n/2\<rceil\>>
  coefficients of a truncated relaxed product, the computation of the
  remaining products reduces to the computation of two truncated relaxed
  products with one fixed argument. Actually, this corresponds to an implicit
  application of Newton's method.

  <section|Conclusion and open questions>

  We have presented a new algorithm for relaxed multiplication. Although the
  new algorithm does not yield a significant improvement from the asymptotic
  complexity point of view, we do expect it to be very useful for practical
  applications, such as the exponentiation of power series.

  First of all, the algorithm is easy to implement. Secondly, it only needs a
  linear amount of memory in the range where divide and conquer
  multiplication is appropriate. In combination with <abbr|F.F.T.>
  multiplication, the algorithm yields a better constant factor in the
  asymptotic complexity.

  When implementing a library for power series computations, it is
  interesting to incorporate a mechanism to automatically detect relaxed and
  fixed multiplicands in a complex computation. This is possible by examining
  the dependency graph. With such a mechanism, one may use the new algorithm
  whenever possible.

  Some interesting questions remain open in the divide and conquer model: can
  we apply Mulders' trick <cite|Mul00|HaZi02> for the computation of ``short
  products'' in our setting while maintaining the linear space complexity
  (see figure <reference|mulders-fig>)? In that case, we might improve the
  number of multiplications in theorem <no-break><reference|new-dc-th> to
  <with|mode|math|<op|\<sim\><no-break>>0.808\<cdots\>*K(n)>.

  <big-figure|<postscript|muldmid.ps|*.4|*.4||||>|<label|mulders-fig>Using
  Mulders' trick in combination with the middle product.>

  In a similar vein, does there exist a relaxed multiplication algorithm of
  time complexity <with|mode|math|<op|\<leqslant\><no-break>>K(n)> and linear
  space complexity? This would be so, if the middle product algorithm could
  be made relaxed in an in-place way (the algorithm is already ``essentially
  relaxed'' in the sense of <cite|vdH:issac97|vdH:relax> in the divide and
  conquer model).

  As it stands now, with the above questions still unanswered, the original
  relaxed multiplication algorithm from theorem <reference|dc-th> remains
  best from the time complexity point of view in the divide and conquer
  model. Moreover, Mulders' trick can be applied in this setting, so as to
  yield a short relaxed multiplication algorithm of complexity
  <with|mode|math|<op|\<sim\><no-break>>0.808\<cdots\>*K(n)>, or even better
  <cite|HaZi02>.

  This has surprising consequences for the complexities of several operations
  like short division and square roots: we obtain algorithms of time
  complexities <with|mode|math|<op|\<sim\>>0.808\<cdots\>*K(n)> and
  <with|mode|math|<op|\<sim\>><frac|1|2>*K(n)> when using
  <with|mode|math|O(n*log <no-break>n)> space, while the best known
  algorithms which use linear space have time complexities
  <with|mode|math|<op|\<sim\><no-break>>K(n)> and
  <with|mode|math|<op|\<sim\>><frac|3|4>*K(n)>. In order to obtain the
  complexity of <with|mode|math|<op|\<sim\>><frac|1|2>*K(n)> in the case of
  square roots, one should use a relaxed version of the fast squaring
  algorithm from <cite|HaQuZi02>, which is based on middle products.

  We finally remark that this relaxed version of squaring using middle
  products is also interesting in the <abbr|F.F.T.> model. In this case, the
  relaxed middle product corresponds to a full relaxed product with one fixed
  argument. Such products can be computed in time
  <with|mode|math|<op|\<sim\>>2*R(n)>, so that we obtain a relaxed squaring
  algorithm of time complexity <with|mode|math|<op|\<sim\>>2*R(n)>. This is
  twice as good as general relaxed multiplication. In the non-relaxed
  setting, squares can be computed in a time between
  <with|mode|math|<frac|1|2>*M(n)> and <with|mode|math|<frac|2|3>*M(n)>,
  depending on whether most time is spent on inner multiplications or fast
  Fourier transforms respectively.

  <\bibliography|bib|alpha|relaxed>
    <\bib-list|[99]>
      <bibitem*|CK91><label|bib-CK91>D.G. Cantor and E.<nbsp>Kaltofen.
      <newblock>On fast multiplication of polynomials over arbitrary
      algebras. <newblock><with|font-shape|italic|Acta Informatica>,
      28:693--701, 1991.

      <bibitem*|CT65><label|bib-CT65>J.W. Cooley and J.W. Tukey. <newblock>An
      algorithm for the machine calculation of complex Fourier series.
      <newblock><with|font-shape|italic|Math. Computat.>, 19:297--301, 1965.

      <bibitem*|HQZ00><label|bib-HaQuZi00>Guillaume Hanrot, Michel Quercia,
      and Paul Zimmermann. <newblock>Speeding up the division and square root
      of power series. <newblock>Research Report 3973, INRIA, July 2000.
      <newblock>Available from <with|font-family|tt|http://www.inria.fr/RRRT/RR-3973.html>.

      <bibitem*|HQZ02><label|bib-HaQuZi02>Guillaume Hanrot, Michel Quercia,
      and Paul Zimmermann. <newblock>The middle product algorithm I. speeding
      up the division and square root of power series. <newblock>Accepted for
      publication in AAECC, 2002.

      <bibitem*|HZ02><label|bib-HaZi02>Guillaume Hanrot and Paul Zimmermann.
      <newblock>A long note on Mulders' short product. <newblock>Research
      Report 4654, INRIA, December 2002. <newblock>Available from
      <with|font-family|tt|http://www.loria.fr/<nbsp>hanrot/Papers/mulders.ps>.

      <bibitem*|Knu97><label|bib-Kn97>D.E. Knuth.
      <newblock><with|font-shape|italic|The Art of Computer Programming>,
      volume 2: Seminumerical Algorithms. <newblock>Addison-Wesley, 3-rd
      edition, 1997.

      <bibitem*|KO63><label|bib-Kar63>A.<nbsp>Karatsuba and J.<nbsp>Ofman.
      <newblock>Multiplication of multidigit numbers on automata.
      <newblock><with|font-shape|italic|Soviet Physics Doklady>, 7:595--596,
      1963.

      <bibitem*|Mul00><label|bib-Mul00>T.<nbsp>Mulders. <newblock>On short
      multiplication and division. <newblock><with|font-shape|italic|AAECC>,
      11(1):69--88, 2000.

      <bibitem*|SS71><label|bib-SS71>A.<nbsp>Schnhage and V.<nbsp>Strassen.
      <newblock>Schnelle Multiplikation grosser Zahlen.
      <newblock><with|font-shape|italic|Computing 7>, 7:281--292, 1971.

      <bibitem*|vdH97><label|bib-vdH:issac97>J.<nbsp>van<nbsp>der Hoeven.
      <newblock>Lazy multiplication of formal power series. <newblock>In
      W.<nbsp>W. Kchlin, editor, <with|font-shape|italic|Proc. ISSAC '97>,
      pages 17--20, Maui, Hawaii, July 1997.

      <bibitem*|vdH02><label|bib-vdH:relax>J.<nbsp>van<nbsp>der Hoeven.
      <newblock>Relax, but don't be too lazy.
      <newblock><with|font-shape|italic|JSC>, 34:479--542, 2002.
    </bib-list>
  </bibliography>
</body>

<\initial>
  <\collection>
    <associate|language|english>
    <associate|page-medium|paper>
    <associate|page-show-hf|true>
    <associate|par-hyphen|professional>
  </collection>
</initial>

<\references>
  <\collection>
    <associate|appls|<tuple|5|4>>
    <associate|auto-1|<tuple|1|1>>
    <associate|auto-10|<tuple|4|4>>
    <associate|auto-11|<tuple|4|4>>
    <associate|auto-2|<tuple|2|2>>
    <associate|auto-3|<tuple|1|2>>
    <associate|auto-4|<tuple|3|2>>
    <associate|auto-5|<tuple|2|2>>
    <associate|auto-6|<tuple|4|3>>
    <associate|auto-7|<tuple|3|3>>
    <associate|auto-8|<tuple|5|4>>
    <associate|auto-9|<tuple|6|4>>
    <associate|bib-CK91|<tuple|CK91|4>>
    <associate|bib-CT65|<tuple|CT65|4>>
    <associate|bib-HaQuZi00|<tuple|HQZ00|4>>
    <associate|bib-HaQuZi02|<tuple|HQZ02|4>>
    <associate|bib-HaZi02|<tuple|HZ02|4>>
    <associate|bib-Kar63|<tuple|KO63|4>>
    <associate|bib-Kn97|<tuple|Knu97|4>>
    <associate|bib-Mul00|<tuple|Mul00|4>>
    <associate|bib-SS71|<tuple|SS71|4>>
    <associate|bib-vdH:issac97|<tuple|vdH97|4>>
    <associate|bib-vdH:relax|<tuple|vdH02|4>>
    <associate|dc-th|<tuple|1|1>>
    <associate|exp-form|<tuple|1|1>>
    <associate|lin-diff-eq|<tuple|2|4>>
    <associate|main-alg|<tuple|3|2>>
    <associate|mid-prod|<tuple|2|2>>
    <associate|midex-fig|<tuple|3|3>>
    <associate|midprod|<tuple|1|2>>
    <associate|midrelax|<tuple|2|2>>
    <associate|mulders-fig|<tuple|4|4>>
    <associate|new-dc-th|<tuple|3|1>>
  </collection>
</references>

<\auxiliary>
  <\collection>
    <\associate|bib>
      Kar63

      Kn97

      CT65

      SS71

      CK91

      vdH:relax

      vdH:issac97

      vdH:relax

      HaQuZi00

      HaQuZi02

      HaQuZi00

      HaQuZi02

      HaQuZi02

      vdH:issac97

      vdH:relax

      vdH:relax

      Mul00

      HaZi02

      vdH:issac97

      vdH:relax

      HaZi02

      HaQuZi02
    </associate>
    <\associate|figure>
      <tuple|normal|<label|midprod>Illustration of the middle
      product.|<pageref|auto-3>>

      <tuple|normal|<label|midrelax>The subdivision used for the new relaxed
      multiplication algorithm.|<pageref|auto-5>>

      <tuple|normal|<label|midex-fig>Illustration of an order
      <with|mode|<quote|math>|6> relaxed multiplication.|<pageref|auto-7>>

      <tuple|normal|<label|mulders-fig>Using Mulders' trick in combination
      with the middle product.|<pageref|auto-10>>
    </associate>
    <\associate|toc>
      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc>Introduction>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-1><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc>The
      middle product> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-2><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc>Relaxed
      multiplication with a fixed series>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-4><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|4<space|2spc>A
      worked example> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-6><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|5<space|2spc>Applications>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-8><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|6<space|2spc>Conclusion
      and open questions> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-9><vspace|0.5fn>
    </associate>
  </collection>
</auxiliary>