23.7k views
4 votes
Explain how you could use it to parallelize the counting across n machines.

What would the total required time to compute the counts for each pair be once you've used your hash function to divide the pair tuples across 6 separate disks (which can write in parallel)? Please show some work/reasoning. A simple numerical answer with no reasoning will not count.

User Kolichikov
by
7.6k points

1 Answer

3 votes

Final answer:

To parallelize the counting across multiple machines, a hash function can be used to divide the pair tuples across separate disks that can write in parallel. The total required time depends on the efficiency of the hash function and the time it takes to compute the count on each disk.

Step-by-step explanation:

To parallelize the counting across n machines, you can use a hash function to divide the pair tuples across 6 separate disks that can write in parallel. Each machine will be responsible for counting a portion of the pairs. The total required time to compute the counts for each pair would depend on the efficiency of the hash function and the time it takes to write to each disk.

For example, let's say the hash function evenly distributes the pair tuples across the 6 disks. If there are 600 pair tuples in total, each disk would receive approximately 100 tuples. If each disk takes 1 second to compute the count of its tuples, the total required time would be 6 seconds (1 second per disk).

User Tomper
by
6.8k points