Sparkify is a fictional music-streaming service created by Udacity. The service is represented by a 26 million row dataset which contains behavior and characteristics of 27 thousand users for each second of October and November in 2018. The size of this dataset is 12 gigabytes. This dataset falls into the realm of ‘Big Data,’ which means that its size is prohibitive for a conventional computer to analyze. The github for this project is here:
For any service, the issue of a customer / user quitting or ‘churning’ is concerning. …
The worldwide pandemic has changed our lives, mostly for the worse. The loss of life, economic opportunity, and old fashioned human contact will be felt for years to come. But there was one aspect of the pandemic that I enjoyed.
Traffic! Or, the lack thereof.
I wondered recently if automobile crashes had improved during the pandemic as well. If there are fewer cars on the road, the surely there should be fewer crashes, right? After about 5 seconds of searching, I found this dataset:
I am using “Version 4,” which…
There is a wealth of information about ML methods and stats for the aspiring data scientist, but not nearly enough information as to how it should be organized. All too often I have seen data science projects presented in Jupyter Notebooks almost haphazardly. The lack of organization obscures the insight and can even hide disastrous methodological errors ( like a data leak ).
But there is solution! By using pipelines from Sci-Kit Learn you can develop a consistent workflow which shows all of your data transformations and can be applied to many data science projects.
The Pipeline() class obtains this…
Always reading, always learning, and always looking for opportunities to implement data science.