The inaccuracy of accurate data

John Lucker, former senior partner and global advanced analytics market leader at Deloitte discusses phenomenal insights around data, innovation and industry benchmarks.

Interviewer notes:
As Deloitte’s global advanced analytics market leader, John Lucker has been a guiding light in the world of insurance analytics for me for over a decade. John recently retired from Deloitte and is now advising insurance and insurtech / tech companies and boards as well as state insurance regulators.

As a consultant, he brought to market numerous insurance underwriting, claims, operational, and distribution solutions that have become standards in the global industry.

He recently took some time to discuss some of his thoughts with me on the “inaccuracy of our accurate data” in the insurance industry. John’s reflections on data and how it is used in our industry are inspiring, the kind of things that can only come from a true insurance veteran and strategist who has worked with data models and analytics for decades.

For more information on John’s background, you can check out his website at www.johnlucker.com or LinkedIn page.

Interview:

David Schapiro (DS): Could you please tell us a bit about yourself and your journey in the insurance industry?

John Lucker (JL): As you mentioned in your introduction, I recently retired as a Deloitte principal after almost 20 years working as a practice leader, innovator, and evangelist for many topics involving advanced analytics strategy in the insurance industry. My journey in the insurance industry started, prior to Deloitte, as an internal audit consultant at a large P&C insurer and later as the CTO and Controller of a reinsurance company. It was these experiences, in 2000 and before, when I developed a stronger appreciation and passion for the challenges that companies face when governing their data and using it to improve underwriting, pricing, claims handling, and other core insurance operational processes. I saw that big data, advanced analytics, and the creative combination of methods into end-to-end solutions could be used to help solve some of the most vexing business problems faced by insurers. At the core of my journey was the creation of proprietary intellectual property positioned within unique business methods to create powerful market differentiation.

DS: How would you define the insurance data ecosystem?

JL: In one word: fragmented. In another word: promising. Clearly data and information are the DNA of the insurance industry. It’s what all aspects of the insurance enterprise must have its arms around to survive and thrive. However, despite the billions of dollars spent annually across the global insurance ecosystem for technology and tools, data warehouses and lakes, data governance projects, external vendor data, and dozens of new and evolving insurtech innovations, the insurance industry typically struggles with some very fundamental issues.

Many companies cannot holistically articulate their state of affairs with some core data concepts: accuracy, currentness, relevance, completeness, redundancy, accessibility, and others. This isn’t for lack of trying, but the challenges are broad and complex. Companies are making progress, but external stakeholders like customers, shareholders, analysts, rating agencies, business partners, and regulators are expecting better and faster progress, because those who are best at managing their data ecosystem are expected to lead the market in competitiveness and profitability.

Also, and critically, human capital is a huge challenge in the industry as institutional knowledge about a company’s data scatters in the winds of retirement with an aging industry workforce, poor institutional metadata, and incomplete data inventories and documentation. But, as I said earlier, there is promise because most companies understand the conundrum and are stepping up their efforts to embrace more robust and comprehensive data ecosystem governance programs.

DS: What are your concerns with this ecosystem with a specific focus on external data?

JL: Data brokers, vendors, bureaus, government or research sources, sensors, and IoT devices are only some of the many places that insurers get external data for their internal purposes. While some of this data has appropriate controls on how it’s governed (i.e. credit bureau data), much of the data provided by thousands of data vendors and brokers is not.

As an example, in 2014 and 2017 I co-authored what I believe is one of the first research efforts documenting the inaccuracies of consumer biographic, demographic, and psychographic data sold by one of the largest data brokers. In our original article published on LinkedIn (click here for article) and in the formal study paper called “Predictably Inaccurate” (click here for article), we found that among a credible study sample, approximately half of the data about the consumers responding to the survey was incorrect. And it was incorrect for numerous reasons – data wasn’t about the correct person, data was aged, data was dirty with incorrect values, data was imprecise, modeled information was incorrect and likely based on incorrect underlying data, etc. Given that this study was done with the data from one of the largest data brokers in the world, this is obviously concerning for many reasons.

Data like this gets used in the insurance industry for many purposes. Rating and pricing plans as well as predictive models are just some of the many ways data influences marketing, risk selection and underwriting, pricing, and claims handling. Bad data can create flawed models which can cause incorrect, disadvantageous, or unintentional discriminatory decisions to be made in many ways throughout the insurance process. Plus, external data is often combined algorithmically with a company’s internal data to create a new breed of synthetic data, which also needs a governance process around it. As you can see, the complexities, issues, and concerns are numerous with how external data is fed into the insurance data ecosystem.

DS: Let’s discuss your study a bit more. What were some other basic findings?

JL: First of all, and very surprisingly, the data was quite inaccurate. More than two-thirds of the survey respondents said their data was only 0 to 50% correct, and one third of the people said their data was 0 to 25% accurate. The inaccuracies were wide and varied. Basic and publicly available information was also only about half correct. Vehicle ownership data was surprisingly inaccurate and reflected not only accuracy issues but also timing issues. Vehicles long since disposed of were still on file as current, children born years prior were not in the family, and homes purchased in the past were not known in the data.

Other basic lifestyle and ownership information was quite flawed; marital status, length of time at residence, family members, and variables modeled by the vendor from subordinate data like household income range were barely correct. Furthermore, the data vendor had a web-based function for consumers to correct their incorrect data, but respondents broadly refused to do so for a variety of reasons including “not my job to do their work and fix their errors.”

DS: Based on those findings, what are the potential ramifications for the insurance industry?

JL: Clearly the insurance industry needs to utilize data that is innovative and diverse in usefulness while also being accurate, timely, and managed appropriately over time. While many insurers leverage external vendor data to augment or replace their internal data-gathering efforts for applications like intelligent risk selection, express underwriting, form pre-fill, and rapid claims settlement, the fact that such external data could have significant governance and accuracy issues is problematic and creates concerning ramifications for the industry.

From the perspective of the big data business, incorrect data can cause many issues for insurers: missed sales opportunities for good risks, sales made to inappropriate risks, policy mispricing, improper risk categorization, missed fraud cases, false positives for fraud flags, incorrect micro-segmentation of risks and customers, disparate impact where data unfairly influences decisions made for protected societal groups, and a host of other undesirable artifacts. Furthermore, errors in data can make algorithms and predictive models partially or entirely erroneous which can trigger significant compliance and regulatory issues and violations for insurers as well as generate societal concerns about the pervasiveness of models and predictions that prove to be flawed.

As it is, models and the data underlying them are under heavy scrutiny by consumer advocacy groups, regulators, the NAIC, the FTC, and other government watchdogs. So if ongoing data ecosystem issues continue to be prevalent, then it seems inevitable that the entire data industry could find itself being audited and regulated if it wants to be a greater participant in the US insurance industry.

DS: While this was for the consumer side of the industry, how is this relevant to the commercial insurance side of the industry?

JL: For many years, as an industry consultant, I used numerous data sources relevant to the commercial insurance industry. While in general, firmographics as a category was reliable and accurate, we still saw considerable inconsistencies between vendors, to the point that depending on certain use cases, we learned to use one vendor over another because of the strengths/weaknesses of their products’ accuracy, completeness, and currentness.

As the diversity of data sources for commercial insurance grows, similarly to the consumer side of data, insurers must probe vendors thoroughly to understand how their data is generated/gathered, managed, maintained, and updated. And insurers must insist that vendors be transparent with them about their data governance policies, information sourcing, and rights to use the data for various purposes.

In some cases, it’s a bit of the wild west in the data vendor arena, and insurers must be sure they don’t go down a path of data usage that ends up being improper later – or worse. For example, I encountered one vendor who represented that they had a desirable category of information that was generally hard to get, but when I probed them for details on where the data came from, it turns out they scraped the data from various websites, which was clearly in violation of nearly all the usage licensing for the websites scraped.

DS: How are regulators and governmental agencies that oversee the insurance industry beginning to focus on issues pertaining to big data?

JL: Beyond the use of consumer credit data, which is already well understood and regulated by the Fair Credit Reporting Act (FCRA), as we’ve discussed here, the governance of big data is fraught with challenges and often not done well by data brokers/vendors. The NAIC and several state insurance commissioners have begun to examine how external data is used broadly, how it flows into various insurance processes overseen by regulators, and any compliance or regulatory impacts evolving from the use of such data. While insurers are the consumers of such data, ultimately the data and the business processes that use it impact insurance customers.

Regulators continuously remind insurers of their obligation to use big data and technology transparently and responsibly and to ensure full compliance with anti-discrimination laws and guidelines. In the end, it’s the insurers that are being held responsible. But that view is beginning to change as regulators see data vendors/brokers as economic participants in the insurance industry and they therefore need to be held responsible as well.

I see the data industry heading towards a path of heightened regulation with the potential to be examined/audited by insurance regulators in much the same way as insurers. In some states, regulators have such authority today, while in other states, legislatures need to empower the regulators with additional authority. But I believe the big data industry is headed in this direction, and in my view, it is long overdue. If data vendors want to be an integral part of the insurance data ecosystem, then they should be able to conduct their business in a manner consistent with regulatory and compliance expectations.

DS: Could you elaborate on why this is being done and what such focus means for insurers and data vendors?

JL: When the data used by insurers was only internally gathered information (from agents, policy applications, claim files, regulated rating bureaus, etc.), insurers could reasonably be held responsible for the quality and integrity of data and its governance. As consumer credit bureau data emerged and became widely used, the FCRA evolved to regulate credit data usage, but the ecosystem was still relatively narrow and controlled. Today, however, any individual insurer is dependent on literally dozens of data providers across a wide expanse of use cases. In addition, most data vendors, for competitive and proprietary purposes, are not too transparent with their insurer customers about their data governance policies and procedures – there are a lot of trust expectations made.

While regulators still expect insurers to be responsible for how such external data is used, many industry watchers believe it is unfair to shield data brokers/vendors from liability for their data governance practices (or in some cases, malpractices). If such data providers want to be a part of the insurance marketplace, then they should manage their business practices consistently with the expectations of insurance industry stakeholders.

I believe the industry is evolving to the point that data vendors/brokers will be expected to certify through industry standard controls audits and attestations that the end-to-end data governance of the data they sell is up to standards mandated by insurance and consumer trade regulators. I also believe that in some states, we are headed towards data vendors being required to be registered to serve the insurance industry. I think that this will be beneficial to insurers and consumers and should serve to improve data governance and data quality significantly.

DS: What future issues and trends do you see impacting big data in the insurance industry and its service and product providers, and what should the industry ecosystem be proactively thinking about and planning for?

JL: Data and cyber security are clearly of great concern to consumers, insurance business leaders, and regulators. We have all read of numerous data breaches where millions (sometimes hundreds of millions) of customers have had their data breached. It seems that in the same way insurers are expected to have defined security standards, so too should all data vendors/brokers serving the insurance industry. Related to this issue is privacy. While many consumers are at peace with giving up some of their information privacy, what I’ve been seeing is a growing discomfort with the imbalanced “value equation,” where consumers and businesses are giving up too much privacy in return for unknown or insufficient benefits. As a result, many customers have begun to take more control of their data and are using new regulations to self-audit, block, and/or delete their external information from vendor repositories.

Some believe it is the intrusiveness of data gathering that is causing some of this retraction of data comfort, but I believe it’s also a result of the common error issues in data and the impact of those errors on consumers. For example, our study found that some consumers wanted their data deleted because if it can’t be right, then they’d prefer it not to be used at all.

Lastly, another issue is more philosophical – just because data can be gathered doesn’t mean it should be gathered. I believe all participants in the data ecosystem need to begin to live by a code of conduct with regards to data privacy, accuracy, usage, ethical boundaries, and how data is maintained. If the data ecosystem can’t police itself through some independent governance mechanism, it is inevitable that the long arm of government will do it for them, and that may not be best for all stakeholders. The extensive stipulations of the California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) are examples of what happens when the data ecosystem gets a bit too wild and the government sees the need to do more to balance the “value equation”.

DS: With regards to big data for insurance, what are you most encouraged by and excited about for the future of the industry?

JL: It’s exciting to me how innovation with IoT and insurtech as well as creative machine learning and AI applications are bringing new forms of useful data to the insurance industry. Such data and the resulting insights should serve to innovate new risk products, better match price to risk, improve proactive loss control, reduce fraud (which costs everyone too much), improve societal safety and wellbeing, and a host of other social goods. The usefulness of data throughout the insurance ecosystem will continue to explode, and with sound governance and responsible management across all stakeholders, big data should continue to serve up groundbreaking improvements in the industry.