top of page

Surviving My First Data Science Project

by Mark Patterson


Introduction

Back in September I enrolled in a data science boot camp. This is a 5-month program that teaches you all the necessities for becoming a data scientist. This includes things like programming in Python, statistics, math, modeling, and many other topics that I have been fearful of. One of the key aspects of the program are real world, large scale projects where you get to apply what you have learned. The first phase of the boot camp focuses on learning computer programming using the Python language. After 3 weeks of intensive reading, discussion, hands-on labs, and good doses of fear and frustration, it was time to devout a week to my first project. In this post I share some of the tools and practices that got me through the ups and downs of this emotional roller coaster.



Setting project boundaries

Having to do one of these projects can at first seem like an insurmountable task. But I found that it helps to fully understand the assignment and expectations. I read the project brief and all provided documentation. I made sure to ask questions of my instructor and project teammate. The next step was to break the project into manageable chunks and figure out a general timeline. My teammate and I tried using an online scrum board – trello.com. Trello allows you to post “cards” on a virtual scrum board to denote what tasks have not been started, what is in progress, and what is completed. It was helpful to break the project up into pieces, but we did not end up using it much after the first couple of days.

The other thing that helps set the boundaries for a project like this is to come up with a well-defined business case and related questions you are trying to answer with your analysis. My teammate and I made sure to select questions we felt were within our capabilities and were doable. Then it was a matter of staying focused and not starting away from the questions we were trying to answer.


Communicating with your teammate

For this first project, having a teammate helped. Your teammate is your first go to when you have a question. Your teammate can help keep you on track. And your teammate can help you through the rough spots and help cheer you on. From the start my partner and I set-up twice per day check-ins via zoom (morning and late afternoon). This helped to ensure we were on track, not repeating work, and we could work through particularly difficult questions. As we got to the end of the project coordination was key as we put together and posted all of our final deliverables. Go team!


Filling in the gaps with online resources

One of the key lessons they teach you from day one of boot camp is that you will need to “Google” for answers and consult documentation… a lot. And this project allowed me to practice doing that …a lot. The good news is that there are plenty of answers out there on the internet. The bad news is that they are not all clearly explained or demonstrated.

There were 2 resources online that I kept going back to and found particularly useful. I found that I picked up more understanding of concepts by watching YouTube videos. Sure it takes more time, but I have found the payoff to be better. For example, I learned a lot from the Python Pandas Q&A Series by Data School. Basic concepts were clearly explained and demonstrated with varying levels of complexity. Typically, alternative approaches were discussed. I also appreciated the format and clarity on the GeeksforGeeks.org website. This website presents answers to specific questions, and this is even reflected in the left navigation for easy reference to related topics. I find this more closely matches my student mind set as it is task based as opposed to feature based (like a lot of documentation). In general I found that I gravitated more to websites for answers when I just needed a reminder on the code.

Keeping organized

One of the marvels of our coding practice is the use of Jupyter Notebooks which are a browser based tool that is combo of code cells and text cells (markdown) that allows you to easily provide labels, explanations and other text to go along with your code. It also does a great job of showing charts and plots inline without having to worry about excessive formatting. We made extensive use of the Jupyter Notebook for both working copies of our work, as well as the final project. I got in the habit of creating a new Notebook each day, so that things did not get too overwhelming. This did not completely replace my old school ways of relying on paper and pen, as I still found it helpful to write notes about process steps and details about my progression of data frames being used in my analysis.

My teammate and I also made use of Zoom for our check-in meetings and when we needed to talk. Otherwise we exchanged questions and attachments via Slack.


Taking breaks and turning off

One of the big surprises for me was how engaged I got in the project. I was eager to get up early and start working and I often found myself working late onto the night. There is something about the chase to solve a problem that can get the adrenaline flowing. Sometimes you are close to figuring something out, and just want to keep going until you get it. Early in the program our education coach cautioned us to be sure and take breaks and not wear yourself out. During this project I learned just what she was talking about and had to force myself to take some afternoon walks to clear my head, and to watch a bit of Netflix before I could go to sleep at night. This project gave me a better understanding of the stereotype of the work engrossed software engineer. It is an interesting phenomenon to get so immersed in your work.


Knowing when to stop

Periodically my teammate and I had to remind ourselves that “OK is good enough.” Although we would have liked to continue to improve our analysis, try a few different things, we had to put on the brakes in order to get all of our deliverables completed by the end of the week. It is important to keep in mind that the project is about the journey (or the process), not the destination (the final analysis). As part of this project there were many learnings about using Jupyter Notebooks, ReadMe files, working with GitHub, creating a non-technical presentation, and working together as a team. It’s important to keep in mind that meta-goal of learning how to learn.


Conclusion

The good news is I made it through my first project. My teammate and I reached the end of project week and had something to show that we were proud of – maybe not my best work ever, but I feel it does a good job of showing my skills at an early stage of the journey. Did I learn anything? Hell yes! I recall sitting down on the couch a few days before the end of the project and jotting down pages of notes as I reflected on the experience. I look forward to 4 more projects and expect to see improvements from repetition and new learnings. And one of the best benefits of all is that I have transformed my fear of a big project into excitement and embrace the opportunity to practice, re-enforce and sharpen skills moving forward.

Comments


bottom of page