A high-level distillation of the common processes followed in an applied data science context.
The breadth of Data Science and scope of its applications in business, heathcare, and beyond makes it difficult to distill into fundamental steps ubiquitous to all data science use cases. However, here are 4 steps that I have found to be mostly consistent both in my work as a data science consultant as well as an in-house position.
Collect and store relevant historical data tables in a data warehouse. Join, aggregrate, and correct data to create a modeling dataset for analysis.
Generate visualizations to highight KPIs relevant to the client’s business performance.
A series of statistical models are trained on the historical dataset, then compared for accuracy to find the best model.
The best-fitting model is deployed to the cloud and invoked with new data points via API to make predictions.