203k views
3 votes
Write a BASH script words.sh that computes the number of occurrences of whitespace- separated (space, newline, etc.) words. The script reads lines of input from stdin until E0F is reached. Each line of input is to be broken into words. Each word (ignoring case and any letters that are not alphanumeric (alphabet or numbers)) has a corresponding "counter' that is initialized to 1 when the word is first encountered and incremented by 1 every time the word appears again. You can ignore case by converting the uppercase letters of a word to lowercase letters

When EOF is reached, the script outputs the words encountered in the input and the corre- sponding counts. The list of words and their counts should be sorted in ascending alphabet ical order (hint: you might want to pipe the output of the loop displaying words and their counts to the utility sort). Example input and output are shown below:
stdin: this is a word and another word BASH scripting is fun, but bash is not for everyone?
stdout: a 1 and 1 another 1 bash 2 but 1 everyone 1 for 1 fun 1 is 3 not 1 scripting 1 this 1 word 2

User Harambe
by
8.6k points

1 Answer

1 vote

Final answer:

The BASH script 'words.sh' reads from stdin, processes input ignoring case and non-alphanumeric characters, counts word occurrences, and outputs sorted words with counts.

Step-by-step explanation:

The BASH script words.sh that computes the number of occurrences of whitespace-separated words can be written using standard UNIX commands such as tr, sort, and uniq. The script reads lines from stdin until EOF (End Of File) is reached, processes them to remove non-alphanumeric characters, converts them to lowercase, and then counts the occurrences of each word, sorting them alphabetically in the end.

Here is the BASH script that performs the task described:

#!/bin/bash
declare -A word_count
while IFS= read -r line; do
# Convert upper case to lower case and remove non-alphanumeric characters
for word in $(echo $line | tr -cs '[:alnum:]' '\\' | tr '[:upper:]' '[:lower:]'); do
# Increment the word's count
((word_count[$word]++))
done
done
# Output words and their counts, sorted
for word in "${!word_count[a]}"; do
echo "$word ${word_count[$word]}"
done | sort
Key steps in processing the input include converting uppercase letters to lowercase and stripping out non-alphanumeric characters to ensure words are uniformly counted, ignoring case and special characters. Finally, the script sorts the unique words alphabetically before outputting with their respective counts.
User ZeroSkillz
by
7.3k points