1.

The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.(a) ShngleFil(b) ShingleFilter(c) SingleFilter(d) CollfilterThis question was addressed to me in examination.This interesting question is from Mahout with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct choice is (b) ShingleFilter

Explanation: The TOOLS that the collocation identification algorithm are EMBEDDED within either consume tokenized text as input or provide the ability to specify an implementation of the LUCENE Analyzer class perform TOKENIZATION in order to form NGRAMS.



Discussion

No Comment Found

Related InterviewSolutions