This exercise which includes data preparation and model building. You have been sent, along with these instructions, a zip file containing a data set from Kaggle.
1. We have sent 220 resumes which are stored as json.
2. Each item of json contains the text content of the resume, as well as a collection of annotations which are labelled.
3. We would like you to prepare the data:
a. Trim and clean the text of special characters
b. Using breaks, commas or spaces, attempt to separate the annotations into individual skills
4. Create a model which attempts to group and score the resumes in these categories: testing, development and management.
a. You may use any data, but skills, qualifications and job titles will be most relevant.
5. Output a table of the index of the resume and the three values for testing, development and management.
6. Comment every line of code, explaining how it works or what it does.
We hope you enjoy the exercise and look forward to reading them and discussing them.