- UID
- 2
- 积分
- 8682
- 帖子
- 2905
- 主题
- 199
- 论坛币
- 11740
- 威望
- 16
- EP值
- 2349
- MP值
- 15
- 阅读权限
- 200
- 注册时间
- 2011-8-3
- 在线时间
- 2597 小时
- 最后登录
- 2024-8-28
|
suppose that we've got a massive amount of chunks (small ones, but come in random order)
How can we linearize them (the generation of linearized chunk index), there are two algorithms to achieve this goal
ONE
1. create a list for holding index streams
2. retrieve one index from the un-sorted index array, check if the index can be attached to the first index stream of the index streams, if so, just append it, otherwise, check the next index stream. If none of the existing index streams in the index streams list fit the requirement, we just create a new index stream, put the index into the newly created index stream, then attach the newly created index stream to the index streams list
3. loop
for the best case, the efficiency is about O(n)
for the worst case, the efficiency is about O(n^2) (we have to check the first m index streams, before we know we have to create the m + 1 one! and what's worse, we have to create n index streams!, that's about n * n / 2.
average performance is about O(m * n), m is the final number of index streams
TWO
1. create a list for holding index streams (named indexStreams)
2. create a list for holding the index stream whose 'last time' is the smallest amongst the index streams in the index streams list, named minIndexStream
3. retrieve one index from the un-sorted index array, check if the index can be attached to the minIndexStream, if cannot we simply create a new index stream, put the index into the newly created index stream, then attach the newly created index stream to the index stream list.
with the help of the minIndexStream, we can improve the performance to O(n), but the problem is how can we maintain such a minIndexStream?
indexStreams = []
newIndexStream = []
indexStreams.append(newIndexStream)
minIndexStream = 0
for an affine index:
oldIndexStream = indexStreams[minIndexStream]
if oldIndexStream.last < index.first:
oldIndexStream.append(index)
# how to determine the minIndexStream then?
else:
newIndexStream = [index]
indexStreams.append(newIndexStream)
# the minIndexStream stays the same, because it is still the smallest one
about maintaining a data structure for keeping the index and the 'last time', we can think of the AVL tree or Red-Black tree, which enables us to insert, delete or query a node in O(log n) time.
|
|