Question

Write a function ngrams(n, tokens) that produces a list of all n-grams of the specified size from the input token list. Each n-gram should consist of a 2-element tuple (context, token), where the context

asked Mar 15, 2021 78.2k views

Write a function ngrams(n, tokens) that produces a list of all n-grams of the specified size from the input token list. Each n-gram should consist of a 2-element tuple (context, token), where the context is itself an (n-1)-element tuple comprised of the n-1 words preceding the current token. The sentence should be padded with n-1 "" tokens at the beginning and a single "" token at the end. If n = 1, all contexts should be empty tuples. You may assume that n ≥ 1.

>>> ngrams(1, 'abc')
[('~', 'a'), ('a', 'b'), ('b', 'c')]
>>> ngrams(2, 'abc')
[('~~', 'a'), ('~a', 'b'), ('ab', 'c')]

Anayansi asked

by Anayansi

7.6k points

1 Answer

← Prev Question Next Question →

Ask a Question

Chuck Lantz · Answer 1 · 2021-03-19T19:51:10+0000

Answer:

Step-by-step explanation:

Assuming input is a string contains space separated words,

like x = "a b c d" we can use the following function

def ngrams(input, n):

input = input.split(' ')

output = []

for i in range(len(input)-n+1):

output.append(input[i:i+n])

return output

ngrams('a b c d', 2) # [['a', 'b'], ['b', 'c'], ['c', 'd']]

If you want those joined back into strings, you might call something like:

[' '.join(x) for x in ngrams('a b c d', 2)] # ['a b', 'b c', 'c d']

Lastly, that doesn't summarize things into totals, so if your input was 'a a a a', you need to count them up into a dict:

for g in (' '.join(x) for x in ngrams(input, 2)):

grams.setdefault(g, 0)

grams[g] += 1

Putting all together

def ngrams(input, n):

input = input.split(' ')

output = {}

for i in range(len(input)-n+1):

g = ' '.join(input[i:i+n])

output.setdefault(g, 0)

output[g] += 1

return output

ngrams('a a a a', 2) # {'a a': 3}

0 Comments

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

0 Comments

Please log in or register to add a comment.

No related questions found

Other Questions