Search Engine

Link to the web application (20 seconds to start)

In 2017, in order to help my team improve their data science skills, I had the ambition to create a search engine for online Data Science courses across the main learning platforms like Coursera, Udemy, EdX and Udacity.

I did a proof of concept using:

  • Django as the web framework
  • Python as the main programming language
  • Pandas as the data analysis and manipulation tool
  • PostgreSQL as the database and full-text search engine
  • Boostrap as the front end library

The different steps of the projects consisted in:

  1. Getting the data from APIs, filtering it, cleaning it and storing it
  2. Creating scheduled jobs that run the above script on a daily basis and store new courses
  3. Creating the front end using Bootstrap
  4. Searching the courses using the full-text search feature of PostgreSQL
  5. Ranking the search results

Step 5 takes into account the following:

  • How often the query terms appear in the course information
  • How close together the terms are in the course information
  • How important is the part of the course where they occur. In this case a higher importance is given to the search terms when they appear in the title

Please note that the application is still live but the code is not maintained. Some APIs have been deprecated, so the list of courses used in the search is from 2017.

Avatar
Jaafar Saadani
Founder & Principal