How to get rid of data issues on your project

If  you are like me you will have been involved in a large number of projects where data is involved and most times there have been issues on the project.  In most IT projects the reason for the project is a better way to manage customer data is needed.

In this article ‘22 tips for better data science‘ I found that I connected with some of the tips more than others.  Here are the ones that jumped out to me:

I like #5  “Fast delivery is better than extreme accuracy. All data sets are dirty anyway. Find the perfect compromise between perfection and fast return.”  To me I’ve seen the team stressing about needing to get the data right.  When you’re working with existing data and trying to manipulate it to fit into a newly designed system it’s like trying to put a round peg into a square hole, it doesn’t work.  What can you do to get the most data into the new structure in the shortest possible time?  What would you need to not take?

I also agree with #12  ” You don’t have to store all your data permanently. Use smart compression techniques, and keep statistical summaries only, for old data. Don’t forget to adjust your metrics when your data changes, to keep consistency for trending purposes.”  Sometimes we get too hung up on needing to have the data.  Storage of data is not such a problem these days with Terabyte storage devices, but how often does the business need to go back to the detailed data.  Ask the questions at the start of the project.  Define the data storage needs in your business case.

# 17 “Data + models + gut feelings + intuition is the perfect mix. Don’t remove any of these ingredients in your decision process” is my most favourite tip.  As a project manager, whilst it’s great to have data and models, what really works for me is also tuning into my gut feelings and intuition.  If something doesn’t feel right to me, it doesn’t feel right.  I go with that, and follow my instincts.  Doing this has allowed me to get better outcomes for my project owner and the business.  Trust yourself.

Consider # 19 in light of cost differential – “When do you need true real time processing? When fraud detection is critical, or when processing sensitive transactional data (credit card fraud detection, 911 calls). Other than that, delayed analytics(with a latency of a few seconds to 24 hours) is good enough.”  Too often we get caught in the story of needing real time processing.  I agree that there are certain transactions that do need to be captured real time, and the financial processing aspects are one of the situations.  A lot of other more administrative type data capture could be done at a delayed interval, without disrupting the business processing.  Again, it’s is something worth considering.

And yes what advice  #22 “Ask the right questions before purchasing software.”  How many times do we, in the process of developing the business case opt for purchasing software off the shelf just because it’s the easy option.  Of course there is also the scenario where the business owner/sponsor has been sold the software by the vendor and he/she then believe that it’s the next best thing to sliced bread and they definitely need it… that’s for another post.

How can you use at least one of these tips next time you have data related issues that arise in starting up, or during your project?