This bugfix grew out of an extended investigation into a problem encountered
by a small number of people running FFADO. FFADO would report that the tx
iso cycle number supplied to the iso tx callback seemingly went backwards -
something which should not ordinarily occur. The bug seemed to be sensitive
to timing and in some cases would disappear when debug traces were inserted
into either FFADO or libraw1394. In essence, libraw1394 was requesting tx
data for cycles which had already been requested.
Initial discussions can be found in the thread "Problem with RME FF800. Can
not start jackd" on the ffado-user mailing list. A followup investigation
is tracked in FFADO ticket number 379
(http://subversion.ffado.org/ticket/379) and referenced in the thread
"Revisiting backward cycle number sequence (ticket 379)" on ffado-devel.
The latter mailing list thread includes a lengthy explanation of what I
think is happening.
To summarise, the root of the problem seems to be that on certain machines
under certain conditions, something causes the kernel to post an iso tx
event at a time when fewer than irq_interval packets have been transmitted.
Unfortunately it has not been possible to determine the underlying cause of
this. Whatever the cause, tests carried out with the reporter of ticket 379
have shown that it is occurring. As a result, the adjustment to
libraw1394's packet_count must be done with reference to the number of
packets reported as transmitted by the kernel instead of simply assuming
that irq_interval packets have been sent.
A patch implementing this fix is at the end of this post. This fixes the
problem when the newer ABI is in use, which provides tx packet timestamps
(and thus an indication of the number of packets actually transmitted) to
userspace. It does not address the problem when the older ABI is used, but
given the nature of the problem I don't think it's possible to fix it
without access to the timestamps (or at least without some way to determine
the number of packets really transmitted).
Testing by "juanramon" (see ticket 379) has demonstrated that it fixes the
"backward cycle number" problem on his machine.
Thanks to Andreas Hehn and "juanramon" for their invaluable help in tracking
this down.
Signed-off-by: Jonathan Woithe <jwoithe@just42.net>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>