124k views
2 votes
You need to sort a file of n GB file stored on a hard-drive. Your RAM contains only 5GB. You have a lightning-fast CPU, but writing or reading from the disk (a single I/O operation) is slow. So we estimate the number of I/O, and ignore CPU time. For simplicity, assume your disk is partition into blocks, each of size 1GB. In each I/O operation, you could read or write one block. The input file occupies the blocks b_1 ellipsis b_n. Explain how to sort the file, using O(nlogn) I/O operations. You could assume that your hard-drive contains n blocks of free space f_1, f_2, ellipsis, f_n Ignore caching issues.

1 Answer

5 votes

Answer:

It is given that:

  • There are "n" GB data need to be sorted.
  • The data is divided into blocks "n" blocks each of 1 GB.
  • Thus data need to be sorted based on the block contents.
  • There are n" blocks each of 1 GB is free on the hard-disk.

Steps to follow to perform the sorting of the data stored In the hard-drive:

  • Divide the blocks to be sorted into set of five blocks, here divide the blocks into n/2
  • sets.
  • 5 GB RAM can access the data fast and perform the sort of the each 2 block sets.
  • Now each of the 2 block sets are sorted based on the block number.
  • Take adjacent two sorted individual 2-blocks from the hard disk and perform the sort operation on the combined block and make them sorted.
  • Continue the merging of individually sorted blocks from 2, 4, 8, 16. n/2.
  • After "log n " merge operations the blocks will be sorted.

Explanation of complexity of the procedure:

  • The data is divided into chunks of 5 block sets.
  • Each set need to be sorted individually, and 5 GB ram can handle the sorting or each two blocks, thus it will take 2 I/O operations in each block sort.
  • There are "n/2" blocks will be there with 2 I/0 operations each collectively it will take n I/O operations.
  • The merging of the sorted blocks will take "
    log_(2)n" operations.
  • The comparison and copying operation may have to be performed at each merging stage.
  • Thus total I/O operation required will be "n*logn".

User Michael Krikorev
by
5.2k points