McMaster University Continuing Education
Hamilton, Ontario, Canada
Instructor
(15)
6
Timeline
  • January 22, 2025
    Experience start
  • April 10, 2025
    Experience end
Experience
1/2 project matches
Dates set by experience
Preferred companies
Anywhere
Any company type
Any industries

Experience scope

Categories
Data visualization Data analysis Data modelling Data science
Skills
nosql apache hadoop adult education apache kafka apache cassandra computer science apache spark data analysis
Student goals and capabilities

This course is part of the Data Analytics certificate program. Students in the program

are adult learners with a post-secondary degree/diploma in computer science,

engineering, business, etc.


Students learn how to collect, manage, analyze, and visualize data to deliver clear

business insights from raw data sources. This course will cover the Hadoop ecosystem

as it is a primary platform for any other tools like Spark or Kafka. This course also

covers an example of NoSQL, such as Cassandra which is suited for distributed

computing. Emerging tools and technologies may be presented as applicable to course

content.

Students

Students
Continuing Education
Beginner, Intermediate levels
24 students
Project
40 hours per student
Students self-assign
Teams of 4
Expected outcomes and deliverables

The final project deliverables will include:

  • A report on students’ findings and details of the problem presented
  • Future collaboration ideas will be identified based on current project outcomes
Project timeline
  • January 22, 2025
    Experience start
  • April 10, 2025
    Experience end

Project Examples

Requirements

The project provides an opportunity for businesses and learners to collaborate to

identify and translate a real business problem into an analytics problem.

The projects, which can be short, will allow the student to apply the skills acquired on

the various tools to address the business problem. Some examples are:

  • Install and use a Linux distribution on a Virtual Machine
  • Discuss the differences between relational databases and NoSQL databases
  • Explain the basic components of the Hadoop ecosystem
  • Address the business problem using:

-the Cassandra Query Language (CQL) to store and retrieve data to/from Cassandra

-the MongoDB Shell to store and retrieve data to/from MongoDB

-popular tools (like Tableau and Microsoft Power BI) to visualize graphs and charts with data from a NoSQL database

-Python and a graphics library to interface to a database and visualize graphs and charts


You should submit a high-level proposal/business problem statement including relevant

data sets and definitions, a list of acceptable tools (if applicable), and expected

deliverables. Business datasets could be provided based on a non-disclosure

agreement or in an anonymized/synthetic data format that is relevant to your

organization and business problem. The course instructors will review the documents

to confirm the scope and timing of the proposed problem and its alignment with the

capstone course requirements.


Analytics solution may be applicable for (however they are not limited to) the following

topics:

1. Demand for social services (healthcare, emergency services, infrastructure, etc.)

2. Customer acquisition and retention

3. Merchandising for trade areas (categories)

4. Quantifying Customer Lifetime Value

5. Determining media consumption (mass vs digital)

6. Cross-sell and upsell opportunities

7. Develop high propensity target markets

8. Customer segmentation (behavioral or transactional)

9. New Product/Product line development

10. Market Basket Analysis to understand which items are often purchased together

11. Ranking markets by potential revenue

12. Consumer personification


To ensure students’ learning objectives are achieved, we recommend that the datasets

are at least 20,000+ rows in size. Data need to be ‘clean’. If more than one database is

provided, which must be conjoined, students will be required to integrate them. This

supports the learning experience and minimizes partner data preparation.