Data Analytics/Business Data Mining

Data Analytics/Business Data Mining

You will mine text consisting of reviews for a particular product or service. Refer to Chapter 12 of Data Mining for the Masses for help on how to perform text mining using RapidMiner. Perform the following tasks:

  • Install the Text Processing 7.5.0 extension in RapidMiner. To do this, open RapidMiner, click on Extensions in the menu bar, then Marketplace (Updates and Extensions). Search for the Text Processing extension and install it.
  • Using your favorite search engine, locate a website or forum on the Internet where people have posted reviews for a particular product or service.
  • Copy and paste at least ten of these posts or comments into a text editor, saving each one as its own text document with a unique name.
  • Open a new, blank process in RapidMiner, and using the Read Documents operator, open each of your ten (or more) text documents containing the customer reviews you found.
  • Process these documents in RapidMiner. Be sure you tokenize and use other handlers in your sub-process as you deem appropriate/necessary. Experiment with n-grams and stems.
  • Use a k-means cluster to group your documents into two, three, or more clusters. Output your word list as well. Take three screenshots: (1) final process stream, (2) clustering results, and (3) resulting word list of tokens and frequencies.
  • In your interpretation of results, answer the following:
    • Based on your word list, what seems to be the most common terms in your documents? Why do you think that is?
    • Based on your word list, are there some terms or phrases that show up in all or most of your documents? Why do you think these are so common?
    • Based on your clusters, what groups did you get? What are the common themes in each of your clusters?
    • How might the company who sold this product or service use your model to their advantage?



Submission Instructions:

Please type up your homework using the homework template. You should include at least three screenshots: (1) final process stream, (2) clustering results, and (3) resulting word list. Remember to answer all questions in your interpretation of results.

"You need a similar assignment done from scratch? Our qualified writers will help you with a guaranteed AI-free & plagiarism-free A+ quality paper, Confidentiality, Timely delivery & Livechat/phone Support.


Discount Code: CIPD30



Click ORDER NOW..

order custom paper