While conducting promotional impact analysis on an anti-histamine drug, we encountered missing data pertaining to Third Party Non-Personal Promotion (NPP) Program. As our customer, the pharmaceutical company recently subscribed to a new vendor to conduct this program, performance metrics pertaining to "3 critical" months was missing. This duration was critical due to it being allergy season in the geography. Consequently, it was imperative to complete the dataset. We followed 3 difference approaches based on the datatype of the missing field. 1. For continuous metrics, ML imputation: Since the data was of temporal nature with seasonal components, a time series was fitted to previous 6 months and subsequent 6 months separately and, their mean was used as the imputed value. 2. For nominal metrics, Mode imputation: The physician universe was divided into segments based on similar writing patterns and some other factors. The most-frequent value for each segment was used as imputed value. 3. For binary metrics, LOCF & NOCB: The last observed value or next observation was used to fill missing value based on the proximity (temporal distance) of previous- or next-non-missing data point, respectively.
When dealing with missing or incomplete data, I usually use the ‘Listwise’ technique. That is, prior to analysis, I exclude any rows that have missing data. By doing this, I can make sure that the data the basis of my analysis is accurate and comprehensive. Five month back I worked with a client where I discovered some missing data in their demographics after a recent marketing campaign study. I immediately decided to use the Listwise approach. and after conducting my study, I was able to fully examine the facts that was available and give my client wise advice. I demonstrated to them how the analysis was sound and useful, which helped the marketing plans succeed.