contextpro.feature_extraction module¶
-
contextpro.feature_extraction.batch_get_ngrams(tokens: List[List[str]], ngram_size: int = 1) → List[List[str]]¶ Prepare n-grams from the provided list of token lists.
- Parameters
tokens (List[List[str]]) – list of token lists, each representing single document
ngram_size (int) – size of ngrams to return, by default 1 (unigrams)
- Returns
list of nested ngram lists
- Return type
List[List[str]]
- Raises
ValueError – if ‘tokens’ provided are not a list of nested string lists
Examples
>>> from contextpro.feature_extraction import batch_get_ngrams >>> tokens = [ ... ["my", "name", "is", "spiderman"], ... ["she", "lives", "in", "australia"], ... ] >>> batch_get_ngrams(tokens, ngram_size=2) [ ["my name", "name is", "is spiderman"], ["she lives", "lives in", "in australia"], ]
-
contextpro.feature_extraction.get_ngrams(tokens: List[str], ngram_size: int = 1) → List[str]¶ Prepare n-grams from the provided list of tokens.
- Parameters
tokens (List[str]) – list of tokens
ngram_size (int) – size of ngrams to return, by default 1 (unigrams)
- Returns
list of ngrams
- Return type
List[List[str]]
- Raises
ValueError – if ‘tokens’ provided is not a list of strings
Examples
>>> from contextpro.feature_extraction import get_ngrams >>> tokens = ["my", "name", "is", "dr", "jekyll"] >>> get_ngrams(tokens, ngram_size=2) ["my name", "name is", "is spiderman"]