Search Engine in The Wild

Closed
Riipen Test Company 1
Vancouver, British Columbia, Canada
Jordan Ell
CTO
(13)
4
Project
Academic experience
120 hours of work total
Student
Anywhere
Advanced level

Project scope

Categories
No categories selected
Skills
computer programing software development and testing information retrieval text analytics
Details

Riipen has a large set of data being created every day from its online platform. A challenge we face is for users to be able get the information they need when they need it by searching appropriate fields, or by conducting a free text search ala Google.

Given a large relational data set, students should be able to provide a storage facility for said data such as a relational database, or document store, with an API available to search the data. The data should be searchable by:

  • Direct matching fields

Given a field name and a value, the API should be able to find documents containing that field with that exact value. An example would be: Find all documents which have a name of "My Document".

There are three types of fields which should be able to be matched: text values, number values, and date values.

  • Fuzzy matching fields

Given a field name and a value, the API should be able to find documents which container that field with "mostly" the given value. An example would be: find all documents which have a name of "My Docu" which would return a document with the name of "My Document"

  • Fuzzy matching multiple fields by weight

Given a list of field name and a weight, the API should be able to find documents which contain those fields with their corresponding matching fuzzy values, and rank them by their fields weight.

For example, find all documents by the search term "My Document", but only search the "name" and "summary" fields and weight the name as 80% and summary as 20%. Two document each with the values "My Document" in their name and summaries should be found, however, the document with the value in the "name" field should be ranked first.

  • Combining search techniques

Given any number of search criteria as listed above, perform a search based on all criteria. All the criteria should have the option of being "must be present" or "may be present".

For example, a search might include find all documents whose name are "My Document" and whose summary fuzzily contains the text "this is a summary". This would be an example of "must be present" as each search criteria must be met.

Another example would be find all documents whose name are "My Document" or whose summary fuzzily contains the text "this is a summary". This would be an example of "may be present" as only one of the present search criteria must be met.

Restrictions:

The API provided in which to search can be anything from a command line interface to a UI. The interface must not make use of raw SQL.


  • A report describing the complete project life-cycle.
  • A class presentation with demo, if possible.
Deliverables
No deliverables exist for this project.

About the company

Company
Vancouver, British Columbia, Canada
0 - 1 employees
Banking & finance

ghfhgfhfd