TCAX 字幕特效制作工具官方论坛 | ASS | TCAS | Python | Aegisub | Lua

 找回密码
 新人加入
查看: 4679|回复: 3
打印 上一主题 下一主题

[libtcas] About the new techniques in libtcas - frame cache [复制链接]

Administrator

TCAX Dev.

Rank: 7Rank: 7Rank: 7

跳转到指定楼层
楼主
发表于 2011-10-5 21:43:33 |只看该作者 |正序浏览
Concepts of frame cache technique.

1. use worker threads (or say, background threads) to generate the frames.
2. the consumer thread (or say, main thread) takes away the frames
3. there is a limit of worker threads (2 is suggested), so maybe a simple threadpool implementation will be required.
4. there is a max number of frames cached (5 is suggested), so maybe a continuous implementation of circular queue will be required.
5. if there is no frames being cached, the consumer thread has to wait
6. if there is no more room for new frames to be cached, the worker threads will wait
7. the consumer thread always takes the first frame in the frame cache queue, if the frame number doesn't match, all the frames cached will be dropped, otherwise, only the first frame will be dropped.


How can frame cache improve the overall performance?

1. background threads can make full use of the CPU (usually previewing a tcas fx will not take up 100% CPU, around 35% is the normal level)
2. the I/O takes up a considerable CPU time, if this can be done simultaneously with the generation of a frame or the overlay of a frame, we can save some time.
3. usually, when we previewing an FX, we will simply follow the frame sequence, jumping is merely seldom.


Process of the new renderer (applying the frame cache technique).

1. Initialize the frame cache struct in the main thread
2. then the worker threads will start to generate frames (start from the first valid frame, which may not be 0), and the generated frames will be stored in the frame cache struct.
3. in the main thread, we use query frame function to ask for the specific frame in the frame cache struct, during which the first frame will be returned, if the frame returned is not the frame specified, all the frames in the frame cache struct will be dropped, if they meet, only the first frame will be dropped. If there is no frames cached, the main thread has to wait for the coming of the very first frame.
4. once there is room for a new frame, the worker threads will start to generate the next frame sequence.
5. if the main thread wants to exit, a signal will be sent to the worker threads to let them exit after their completion of their current work (the generating of frame will not be terminated), after the worker threads exit, the main thread exits.


Implementation tips

1. use a new .h header and a new .c file to contain the code of frame cache facility.
2. frame cache is only a utility to libtcas core (tcas.c, tcas.h), just like hla_mt.c and hla_mt_mm.c
3. so the .c file name can be hla_frame_cache.c


Issues

1. What should be stored in the frame cache struct?
    a) a threadpool
    b) max cached frames count
    c) a queue stored the frames cached, (use continuous circular array based queue implementation other than link implementation)
    d) etc.

2. The vital functions
    a) libtcas_frame_cache_init, to initialize a frame cache struct
    b) libtcas_frame_cache_run, to actually run the worker threads of the frame cache in the background
    c) libtcas_frame_cache_fin, finalize the frame cache struct
    d) libtcas_frame_cache_get, to get the specified frame in the frame cache, if there is no available frame, the function will wait for the completion of the frame, by taking away the specified frame, the function will ask the worker thread to generate the following frames.

3. to use a threadpool, or just multi-threads?


Additional Notes

1. The main reason that the frame cache feature can speed up the overall rendering is that the background threads make full use of the spare time when the main thread doing the overlaying. (frame generating and overlaying are two separate process, hence multi-threading can improve the performance).



Administrator

TCAX Dev.

Rank: 7Rank: 7Rank: 7

地板
发表于 2011-10-8 00:59:51 |只看该作者

More bottlenecks to consider

I/O is always the bottleneck, especially the first time to read a file. (The first time to read a file will be likely to take as long as ten times as the following read operation duration, this is because OS usually provides some cache facility to speed up the following access to recently read files.). We can be sure that the performance can be improved, because we are not always reading the file, but only read it when needed, (to generate a new frame), so we can read some and cache them when the I/O is idle to save the time spent on reading files.

Two points should be kept in mind:
1. We can do the cache only when we have got free CPU time
2. And the I/O is not busy. In another word, make full use of the spare time.

Test Result:

first access to a file (tcas_test_illusion)
initialization takes 3214 mm


total time for rendering 2173 frames 18673

I/O time 6925
Press any key to continue . . .

second access to the file
initialization takes 78 mm


total time for rendering 2173 frames 15974

I/O time 565
Press any key to continue . . .

the third time
initialization takes 78 mm


total time for rendering 2173 frames 16911

I/O time 408
Press any key to continue . . .


-------------------------------

first access to another file (11eyes_op_720p.tcas)
initialization takes 2715 mm


total time for rendering 1733 frames 16520

I/O time 5346
Press any key to continue . . .

second access to the file
initialization takes 31 mm


total time for rendering 1733 frames 13541

I/O time 420
Press any key to continue . . .


-------------------------------

first access to another file (tcas_test_bezier)
initialization takes 5195 mm


total time for rendering 2000 frames 74412

I/O time 52843
Press any key to continue . . .

second time
initialization takes 125 mm


total time for rendering 2000 frames 20826

I/O time 654
Press any key to continue . . .

third time
initialization takes 125 mm


total time for rendering 2000 frames 21481

I/O time 953
Press any key to continue . . .




By using the OS's default file cache support, we are likely to be able to preview a TCAS FX more smoothly thanks to the first time preview's cache. But it's not the case when we are the first time to preview the TCAS FX. So, we have to provide our own cache facility. As a summary, I call the new technique file cache, which again improves the overall performance by reducing the time spent on I/O. For more information about file cache, see this thread.





Administrator

TCAX Dev.

Rank: 7Rank: 7Rank: 7

板凳
发表于 2011-10-7 21:23:39 |只看该作者

benchmark of the frame cache performance

Sample TCAS file for testing:

RIXE\examples\sophisticated\tcas_test_bezier\test.tcas

tcas info

the TCAS file contains both normal frames and key frames.

total number frames: 1918


testing result (time unit: mm):

the old way (no frame cache):
59467
60464

1 frame (size of the queue for caching frames):
44554
46769

5 frames      10 frames      20 frames      50 frames
45193         44788           43586            42152
43805         43556           43056            42167


Note:
The data listed in the above is generated by the debug versions of the renderers. If release versions are applied, the difference will be bigger (renderer with no cache takes about 3000 mm, while with cache takes about 2000 mm). In the following, I will use release version over debug version.


Conclusion:
from the test result in the above, we can find that, the more cached frames, the faster it renders. But the difference becomes minor when we already have applied the frame cache technique with a considerable queue size. And there is also an memory issue, that is the more frames we want to cache, the larger memory we require, e.g. for a 1280 * 720 TCAS frame, we need around 3.5MB memory, and 8MB for a 1920 * 1080 TCAS frame! Another trade-off catches our attention, if we are to jump between frames other than play the video in a continuous sequence, the frames we have cached will probably be unneeded, thus we have to re-cache the frames from the jump point. So, the more we cached, the more we wasted on a seek operation.

As an experience, 20 frames maybe just good.



您需要登录后才可以回帖 登录 | 新人加入

GitHub|TCAX 主页

GMT+8, 2024-11-23 08:24

Powered by Discuz! X2

© 2001-2011 Comsenz Inc.

回顶部
RealH