You will modify the text analysis code so that it does the following: Processes the words for better analysis, Provides recommendations based on two literary works that you select , Uses the proper title when referring to the literary works

You will modify the text analysis code so that it does the following: Processes the words for better analysis, Provides recommendations based on two literary works that you select , Uses the proper title when referring to the literary works

Text Processing

Modify the script so that it does at least two of the following for better comparisons:

  • Strip punctuation from each word
  • Remove words with little meaning (e.g. “the”, “a”)
  • Make all words lower case
  • Employs a stemmer

Recommendation

Choose two literary works (e.g. Pride and Prejudice, Jane Eyre) and use them to recommend similar works (“If you like Pride and Prejudice, you will also like these works…”). Your program should go through the remaining works and list them under the choice with the higher similarity index. For example, if Dracula is more similar to Jane Eyre than Pride and Prejudice, Dracula should be listed under Jane Eyre.

Report by Titles

Modify your recommendation program so that it reports the titles of the works rather than their file names. To do this, write a program that reads in the titles.txt file and creates a dictionary that looks up the title using the file name. This dictionary should then be used to report the works by their title instead of their file name.

Code:

import os

import math

def count_word(table, word):

‘for the word entry in the table, increment its count or init to 1’

if word in table:

table[word] += 1

else:

# initialize count of word to 1

table[word] = 1

def analyze():

”’read all texts from the docs folder, report similarity comparisons

among all pairs”’

doc_table = dict()

word_set = set()

os.chdir(‘docs’)

fileList = os.listdir()

for fname in fileList:

print(“Opening ” + fname)

fd = open(fname, “r”, encoding=”utf8″)

doc_table[fname] = dict()

data = fd.read()

print(“splitting”)

dataList = data.split()

print(“{} has {} words”. format(fname, len(dataList)))

for word in dataList:

word_set.add(word)

count_word(doc_table[fname], word)

fd.close()

os.chdir(‘..’) # return to parent directory

for fname in fileList:

for fname2 in fileList:

sim = similarity(doc_table[fname], doc_table[fname2], word_set)

print(“{:.2f} : {} vs. {}”.format(sim, fname, fname2))

def build_title_file():

“creates titles.txt based on works in the docs folder”

tfd = open(“titles.txt”, “w”)

os.chdir(‘docs’)

fileList = os.listdir()

for fname in fileList:

print(“Opening ” + fname)

fd = open(fname, “r”, encoding=”utf8″)

for line in fd:

if “Title: ” in line:

tfd.write(fname + “n”)

tfd.write(line[7:])

break

fd.close()

os.chdir(“..”) # return to parent directory

tfd.close()

def similarity(tableA, tableB, words):

‘return cosine similarity between tableA and tableB over all words’

ab = 0

a2 = 0

b2 = 0

for w in words:

ab += tableA.get(w, 0) * tableB.get(w, 0)

a2 += tableA.get(w, 0) * tableA.get(w, 0)

b2 += tableB.get(w, 0) * tableB.get(w, 0)

return ab / (math.sqrt(a2) * math.sqrt(b2))

TXT file:

alice_in_wonderland.txt

Aliceís Adventures in Wonderland

dracula.txt

Dracula

frankenstein.txt

Frankenstein

jane_eyre.txt

Jane Eyre

moby_dick.txt

Moby Dick; or The Whale

pride_and_prejudice.txt

Pride and Prejudice

tale_of_two_cities.txt

A Tale of Two Cities

udolpho.txt

The Mysteries of Udolpho

wizard_of_oz.txt

The Wonderful Wizard of Oz

"You need a similar assignment done from scratch? Our qualified writers will help you with a guaranteed AI-free & plagiarism-free A+ quality paper, Confidentiality, Timely delivery & Livechat/phone Support.


Discount Code: CIPD30



Click ORDER NOW..

order custom paper