
上QQ阅读APP看书,第一时间看更新
Ways in which structured methodologies can help
Here are several points to bear in mind concerning the advantages of structured methodologies:
- Data is coming at us fast and furious. We need to keep track of the many data sources, evaluate which ones are the best ones to use at any given time and continually monitor them for data accuracy. Expect changes to come quicker than expected. Predictive modelers need a structured methodology to be able to keep track of things; changes can be disruptive at whatever stage of the modeling process they are in.
- The difference between useful data and data masquerading as useful data is increasing. Structured methodologies help with maintaining metadata repositories for information, which can help in determining what data is useful and what is not.
- The number of data analysis techniques are increasing. Knowing which analytical techniques to choose can be a daunting task. Dedicating projects purely for evaluating which data techniques are more useful than others for a particular business problem is a laudable goal.
- Structured methodologies help with objectivity. Everyone has their own subjective technical biases that they bring to the table. Creating a structured way for sharing code and results can encourage out-of-the-box thinking.
- Incremental Improvement: Often projects are too large or ambitious. Projects can be organized in a way which offers small ways to provide value. This is easier to attain when projects are encapsulated within a structured methodology.
- Iterative analytics development uses structured techniques that enforce good data analytics practice such as being able to iterate in small steps. If any discrepancies are found later on, it is relatively easy to backtrack through the incremental updates.
- Divide and conquer helps to organize projects involving multiple team members who work on different parts of project
- Reproducibility helps analytic teams reproduce the same results again and again. This has always been important in research, but also has implications for any large-scale data project, in which you can be dealing with a multitude of data raw sources. Often, one needs help with understanding transformed data sources in which business rule transformations are unclear and can be changed without your knowledge. Certainly, this is also important when implementing version control, but this is also important when you are upgrading packages and need to recreate results. When data sampling is involved, it might also be possible that the original selection criteria which produced the sample is no longer available, and that reproducibility may be lost. Therefore, it is important to develop sample strategies in a structured way, which are robust and can be reproduced with future analyses.