advanced applied progarmming (Python)
advanced applied progarmming (Python)
CSCI-398: Advanced Applied Programming
A2: Image Search SaaS
Use key-value stores and Ajax web interfaces to search images in DBpedia, an extracted version of WIkipedia: have a quick look.
This assignment has two milestones:
- A2M1: Use Python to create command-line interface (CLI) to load and query the DBpedia/Wikipedia data.
- A2M2: Use Flask, Bootstrap, and React to build a web app (backend and frontend) to query DBpedia and display the search results.
1. Command-line interface (A2M1)
1. Command-line interface (A2M1)
Create a command-line interface (CLI) to load and query DBpedia/Wikipedia. The CLI uses two key-value stores (KVS): images and terms. The terms KVS maps the Wikipedia keywords to Wikipedia articles, and the images KVS maps the articles to images. This assignment aims to create a map from keywords to images. For example, the keyword “cloud” might give us pictures of clouds.
1.1. Getting started
- Install Python (https://www.python.org/downloads/or use brew on OS X)
- In Git Bash, set the PATH to Python: export PATH=”$PATH:/c/Python34
- Get the code from GitHub: https://goo.gl/TG8hEo
- Clone the a2 repo in the shell (using instructions similar to A0)
- Change to the udc-csci-398-a2 directory
- Confirm Git Bash sees Python: python –version
- Install the dependencies in the shell: pip install -r requirements
- Download the DBpedia files
- images_en.nt.bz2: http://downloads.dbpedia.org/2014/en/images_en.nt.bz2
- labels_en.nt.bz2: http://downloads.dbpedia.org/2014/en/labels_en.nt.bz2
- Decompress both files (for e.g., in a terminal, run “bunzip2 images_en.nt.bz2”)
- Move the decompressed file to the data/ directory
1.2. Wikipedia data
The images_en.nt file associates Wikipedia categories with images, whereas the labels_en.nt file associates Wikipedia categories with labels. For instance, the category “Cloud” might be associated with an image of a cloud, as well as the label “cloud” (and perhaps other labels). In combination, we can use these files to search for images using search terms (i.e., text).
Both files consist of triples <A> <B> <C> that describe various aspects of Wikipedia categories, not just images and labels. You can think of these triples as an association between <A> and <C> where <B> describes the type of association. The files contain several kinds of triples, but, for the purposes of this assignment, only two types are relevant: the ones in images_en.nt where B is http://xmlns.com/foaf/0.1/depiction (in this case, A is the category and C is an image URL), and the ones in labels_en.nt where B is http://www.w3.org/2000/01/rdf-schema#label (in this case, A is the category and C is the label). You can use the less command to have a look at the files, but you do not need to write code for reading these files directly; we have provided some code for you.
1.3. Loader
Your first task is to write a loader: loader.py. You are given modules to “put together” for the loader, all in the a2m1/ directory.
- kvs.py: implements key-value storage systems: disk, mem, and cloud.
- kvs_test.py: run kvs test cases.
- parser.py: parse data files.
- parser_test.py: run parser test cases.
- loader.py: loads them into the key-value stores.
- loader_test.py: run tests on loader.
- test.py: run all tests.
The loader parses data files, creates related key-value stores, and then loads the stores. It loads the image store with subject/object pairs representing Wikipedia categories and related image URLs and loads the term store with labels and Wikipedia categories. We have provided code for parsing the data file and creating the key-value stores; your first implementation task is loading the stores. However, there are some restrictions on how you should do this:
- loader.load_images(): You should first index the images from images_en.nt. The parser only returns images where the relationship is http://xmlns.com/foaf/0.1/depiction: representing an image depicting a specific Wikipedia topic, as opposed to other related topics. All images should be stored in a key-value store called images (the kvs parameter), where the key is the Wikipedia category and the value is the image URL (the key-value pairs generated from the image_iterator): see sample code.
- loader.load_terms(): You should next create an“inverted index” from labels_en.nt. Here the idea is to index, for each label, the Wikipedia category or categories that correspond to that label. All labels should be stored in a key-value store called terms (the kvs parameter) where the key is a word from the label and the value is the Wikipedia category. You should only add an entry to this store if the category exists in the images key-value store, i.e., if we have an image. If a label contains multiple words, you should create separate entries for each word: see sample code.
To be able to answer queries with approximate matches, you should a) regularize the case for the words in the title, and you should b) use a stemming algorithm to remove suffixes before storing the item. We are including the “helper” class Stemmer, which stems words. You will need to learn how to use it from looking at the function definition for stem.
Example: The images KVS may contain the following entries.
key: http://dbpedia.org/resource/American_National_Standards_Institute
value: http://upload.wikimedia.org/wikipedia/commons/8/8f/ANSI_logo.GIF
The terms KVS may contain the following entries.
key: american
value: http://dbpedia.org/resource/American_National_Standards_Institute
key: nate
value: http://dbpedia.org/resource/American_National_Standards_Institute
key: standard
value: http://dbpedia.org/resource/American_National_Standards_Institute
key: institut
value: http://dbpedia.org/resource/American_National_Standards_Institute
Note that the label (“American National Standards Institute”) has been broken into separate words, and that case regularization and stemming have been applied.
1.4. Querier
Now, you will write a querier module, querier.py, that reads keywords from the command line. You are given a querier module in the a2m1/ directory with a stub querier.query() function.
querier.query(): The querier module opens both the terms and images key-value stores. For each keyword, it should retrieve the matching Wikipedia category from the terms store, then retrieve all URL matches for those categories from the images key-value store. You should apply the same transformations (stemming, case regularization, …) as in the loader: see sample code.
The querier prints each keyword, then the list of matches, to the console. The provided main method already contains code to do this; please do not change the output format in any way, since this will break our grading tools.
1.5. Submission checklist
- You implemented the loader.load_images() and loader.load_terms() functions without changing the function definitions.
- You implemented the querier.query() function without changing the function definition.
- You ran the loader tests and passed all test cases: python loader_test.py
- You ran the querier test and passed all test cases: python querier_test.py
- You printed your full name and UDC username to the console when the the loader and querier are invoked.
- Your code contains a reasonable amount of useful documentation (required for style points).
- You have completed all the fields in the README.md file.
- You have checked your final code into the git repository.
- You submitted a .zip file
- You included in your .zip file your solution (all .py files)
- You included in your .zip the README.md file, with all fields completed
- Your .zip file is smaller than 100kB: excluding the large data files (such as images_en.nt or labels_en.nt).
- You submitted your solution as a .zip archive to Blackboard before the A2M1 deadline on the course calendar. (If you choose to use jokers, each joker will extend this deadline by 24 hours.)
2. Web app (A2M2)
You will write a Flask backend and React frontend using code is in the a2m2/ subfolder from git.
2.1. Backend
You will write the backend app, backend.py, that implements a REST API . You are given Python code that uses Flask, a Python web framework. Flask implements our search REST API: /api/search.
A basic backend app in Flask follows.
# backend.py
from flask importFlask
app =Flask(__name__)
@app.route(“/api/hello”)
def hello():
return”Hello World!”
if __name__ ==”__main__”:
app.run()
In addition to implementing our API, Flask also serves static files (backend.send_static) such as index.html. Run the development server in the shell:
$ python hello.py
*Running on http://localhost:5000/
Test your hello API endpoint in your web browser: go to http://localhost:5000/api/hello
Test data: We need data to test our backend. Run the loader.py in A1M1 to generate test data on disk: python loader.py -d –filter=”Azh”. You are given backend code that assumes you created two Shelf objects (disk-based key-value stores) in the a2m1 directory:
images_kvs =Shelf(‘a2m1/’+ IMAGES_KVS_NAME)
terms_kvs =Shelf(‘a2m1/’+ TERMS_KVS_NAME)
Using the Shelf objects, you should implement the backend:
- backend.search(): You should first return the search results in JSON format. Using the a2m1.query() function, write code to return a list of matches, format the results as a dictionary, convert the dictionary to JSON, and return a JSON response to the user: see sample code.
- backend.name(): Your should then create a new API endpoint: api/name. This endpoint returns your full name and UDC username in JSON: {“name”: “Shakir James (shakirjames)”}
To run the backend, run the Flask development server in the shell:
$ cd udc-csci-398-a2
$ python -m a2m2.backend
Note: When you make changes to the backend.py, you must reload the development server: press CTRL+C to quit, and re-run the backend as above.
2.2. Frontend
Next, you will finish the frontend app, frontend/index.js. You are given JavaScript code that implements a single-page web app in React. Our React app expects our backend search API to return the data model:
{
“searchInformation”: {
“totalResults”:100,
},
“items”: [
{link:”http://en.wikipedia.org/wiki/Special:FilePath/AzharUsman.jpg”},
{link:”http://en.wikipedia.org/wiki/Special:FilePath/Holland_2004.jpg”}
]
};
Our designer gave us an HTML mock (see frontend/mock.html):
Thinking in React, we decomposed the frontend app into components[1]:
Our frontend app has five components:
- ImageSearchContainer (orange): app container
- SearchBar (blue): accepts user input
- ThumbnailGrid (green): show grid of image thumbnails filtered based on user input
- ThumbnailRow (brown): display a row of images
- ThumbnailImage (red): display an image
The components form a hierarchy:
- ImageSearchContainer
- SearchBar
- ThumbnailGrid
- ThumbnailRow
- ThumbnailImage
Our React app stores static data in props. Taking a top-down view of the app, the ImageSearchContainer component takes the data model as a prop and its subcomponents render the props data: flowing data down the hierarchy.
The app stores dynamic data in state. The state data adds interactivity: dynamically changing the data model. The state in our app consists of the search text because the images can be computed based on the search text. The searchText state lives in a ImageSearchContainer component and its subcomponents use callback functions to alter the searchText: flowing data up the hierarchy via explicit callbacks.
We have written most of the frontend code. However, you should write a new React component to display your author name: replacing Shakir James (shakirjames).
Author:Your task is to use your /api/name API endpoint to display your full name and username in the frontend app. You should create an Author React component, get data from the /api/name endpoint, and render the component with the API data: see sample code.
2.3. Final testing
Create DynamoDB tables: First, we’ll need to create tables in DynamoDB. Navigate to https://console.aws.amazon.com/dynamodb. Click the Create Table button: enter “images” as the table name, enter “kvs_key” as the primary key (hash key) and keep string as the type. For table settings, uncheck “use default setting” and increase both the write and write capacity units from 5 to 20. Click Create. Repeat this process for a table called “terms”.
Add your AWS credentials: To work with DynamoDB, you should add your AWS credentials to your boto3 configuration file. We already installed boto3 (from our requirements file). Now, setup your AWS credentials for boto3 in ~/.aws/credentials[2]:
[default]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_SECRET
Load data to DynamoDB: To be able to easily switch your code between Shelf and DynamoDB, the M1 CLI accepts a kvs option. Run the loader on the shell to upload data to DynamoDB:
python loader.py -d –kvs=cloud –filter=”Az”
The total Wikpedia dataset will result in about 1.5GB worth of data, which will take a long time to create (and, worse, a lot of Amazon cycles, which will reduce your credits), so you should set filter to Ar to index only topics that start with ‘Ar’, which should in a manageable database size. For testing, you may want to work with even smaller data sets, e.g., just the first 100 topics.
After some time, your data should be in DynamoDB – you can confirm this by checking in the dynamoDB console, clicking on the table, and then clicking on the items tab.
Modify backend.py: The backend code currently uses Shelf objects, disk-based key-value stores. You should change the backend to use DynamoDB: comment out the lines that instantiate the Shelf objects and uncomment the lines that use the DynamoDB objects.
Re-run your backend: Now, rerun backend.py and navigate back to the URL in the browser. Your image search application should work with any search term that was indexed (if your filter was Ar then any search term starting with these letters should produce image results).
2.4 Submission checklist
- You implemented the backend.name() API endpoint.
- You implement the frontend Name component.
- Your code contains a reasonable amount of useful documentation (required for style points).
- You have completed all the fields in the README file.
- You have checked your final code into the git repository.
- You are submitting a .zip file
- Your .zip file contains all the files needed to run your solution (including all .js)
- Your .zip file contains the README file, with all fields completed
- Your .zip file is smaller than 100kB.
- You submitted your solution as a .zip archive to Blackboardbefore the M2 deadline on the first page of this assignment. (If you choose to use jokers, each joker will extend this deadline by 24 hours.)
2.5 Notes
Please keep in mind that Amazon charges for machine utilization, data transfer, and data storage. Enrolling in Amazon’s AWS Educate program should give you sufficient credit to complete this assignment (as well as the remaining assignments). Nevertheless, you should carefully monitor your credit level to make sure that you do not run out, and you should release any resources when you are no longer using them. For instance, after completing this assignment, you should delete the data you uploaded to DynamoDB.
3. Extra Credit
We will offer the following extra credit items in this assignment:
- M1: Submit an additional implementation of the Querier in JavaScript [+10%]
- M2: Don’t include images that are dead links. [+10%]
- M2: Infinite scrolling – show all images on the app. However, don’t load all at once – just load eight. When the user scrolls down and reaches the bottom of the page, you should load and display eight more – similar to how the Facebook news feed works. [+10%]
These points will only be awarded if the main portions of the assignment work correctly. Any extra credit solutions should be submitted with the relevant milestone, and they should be described in the README.
[1] “Thinking in React – Facebook Code.” 2014. 26 Sep. 2016 <https://facebook.github.io/react/docs/thinking-in-react.html>
[2] “GitHub – boto/boto3: AWS SDK for Python.” 2013. 28 Sep. 2016 <https://github.com/boto/boto3>
Published by Google Drive–Report Abuse–Updated automatically every 5 minutes
"You need a similar assignment done from scratch? Our qualified writers will help you with a guaranteed AI-free & plagiarism-free A+ quality paper, Confidentiality, Timely delivery & Livechat/phone Support.
Discount Code: CIPD30
Click ORDER NOW..


