InsurTech for Risk Management

Kris Kogut
Principal, Risk Modeling Services, PwC US

Peichung Shih
Director, Risk Modeling Services, PwC US

Part 1: Data Infrastructure

Tackling data risk: How risk managers get to modern data architecture

In our digital landscape, what we know — the past, present and future potential — is represented by one word: data. And the accuracy, accessibility and security of that data is pivotal to business strategy and stakeholder trust.

For risk managers, data can be both an asset and a liability. On the one hand, information provides a tremendous amount of insight when managing, mitigating and monitoring an organization’s risks. However, if your data isn’t properly sourced — or worse, just wrong — it can become a liability.

Many organizations are undergoing a transformation, investing in data infrastructure that can reduce sourcing challenges going forward.

Why one source of truth matters

Data can come from anywhere. Sources of data are internally generated as often as derived from external sources. Most information is initially gathered from emails, Word documents, PDFs or loss reports and downloaded to Excel.

As information passes between many parties, the source of truth — the practice of structuring information models to reflect agreement of a single set of elements — can be compromised. Manual data entry, the use of spreadsheets and human error, for example, may bring controllership concerns. Further, stakeholders with the same data needs often obtain their information independently, which results in inefficiencies and frustration when numbers do not match.

Without a defined structure and process in place to determine who owns the data or whether it’s reliable, the ultimate source of the truth is often difficult to unravel.

How it begins: Creating a roadmap

The key to having one source of truth is establishing the infrastructure that enables a single repository of data. With planning and direction, you can scale operational activities in the following areas:

Management reporting of total cost of risk (TCOR)
Analytics
Funding and reserving strategy
Oversight of third-party administrators or claims management
Risk decision-making

Creating a roadmap for a data infrastructure can help you define a unified approach to your transformation journey. Begin with four key building blocks:

Establish and document data control requirements to confirm completeness and accuracy.
Define the source of truth for risk data, including premium, expense, claims and exposure data.
Establish scalable data interfaces with internal systems and key vendors for data pipelines.
Capture, standardize, aggregate and automate the data.

Bringing information together at a common granularity facilitates analysis that is unachievable under a more disparate approach. Having a plan that takes into account many data components that are designed to act in concert with either migration or modernization can help your infrastructure mature seamlessly into the future.

Using Amazon Web Services (AWS)¹ as an example of modern data infrastructure, we can examine how a state-of-the-art cloud migration works for an organization.

Building the structure: Placing the bricks of the foundation

1. Data lake. Before defining the purpose of the data, sourced information needs to be pooled into a single repository, the data lake.

Because risk information can be available in many forms (e.g., flat files, pdf, videos), the data lake becomes a central location of accessibility. Specifically, a data lake centrally houses risk data from all sources, including third-party administrators, carriers, vendors and others. Its two main purposes are:

Metadata extraction and cataloging files by type, size, origin or context
Data quality verification

A metadata store implementation and the native access control offered by AWS enable the right data to be securely identified by proper users, groups and roles.

Data quality verification is commonly used to retrieve information quality rules. Structured and semi-structured data (retrievable by AWS Glue Catalog) and unstructured data (retrievable by AWS DynamoDB) can perform data quality checks, transform files and capture metadata.

2. Ingestion: Once your information is in the lake, it can be organized through a process called ingestion. Upstream data is fed into storage through batch jobs or streaming services. Processed data will then be further partitioned for downstream applications. For data lake ingestion and storage, Amazon S3 is secure for object-level access control, compliant with native logging, flexible in storage capacity and cost-efficient.

3. Warehouse and analytics. A data warehouse is a purpose-built data storage center with a framework designed to optimize data query and insight discovery. It reads information from the data lake, establishes relationships among entities and enforces data integrity across the data domains.

Using business intelligence tools that can accomplish these tasks is critical to reaching your infrastructure destination. Here are some examples of tools that work in concert:

AWS Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes analyzing data simple and cost-effective. Queries can be done on datasets loaded in the clusters (or directly in the data lake with the Spectrum feature) with the ability to write data back to the data lake in open formats.
AWS QuickSight is a fast, easy-to-use, cloud-native business analytics service for building visualizations, performing ad-hoc analysis and quickly getting business insights from data. A TCOR dashboard deployed to QuickSight, for example, consumes data from the source of truth (e.g., RedShift) and tailors insights for different users (risk analysts, chief risk officers, etc.) at the appropriate granularity level.
AWS Athena offers a query engine to discover data stored and registered in Glue Catalog. It’s often used to explore data in its raw form to derive insights not yet included in the current analytics. It plays a key role in the continuously evolving risk management process.

The cleansed and enriched data is consumed in the downstream analytical tools/activities, such as TCOR dashboard, ad hoc claim analytics and additional insights derived in ML models in SageMaker.

4. Transformation and orchestration. AWS offers a wide range of compute options (including Lambda, Glue Job, EMR, Glue DataBrew) for data aggregation, enrichment and transformation depending on the types of workloads. AWS Lambda, for example, offers a cost-efficient serverless function-as-a-service suitable for ingesting and processing files smaller than 1GB. Glue Job and EMR offer distributed processing power that’s scalable to handle large files with complex transformation logics.

In modern data and business processing pipelines, multiple transformations are often needed to modularize the logic and increase execution efficiency and code maintainability.

Two commonly implemented patterns are orchestration and choreography:

Orchestration employs an orchestrator (AWS Glue Workflow, Step Function) to manage the processes. It knows which step in the pipeline should be executed at a given moment then monitors the progress.
Choreography doesn’t have a master node to oversee the whole process but instead relies on an event-based architecture (S3, lambda) in which each step in the pipeline communicates with each other indirectly through events similar to microservices.

Orchestration or choreography have their advantages and disadvantages. Each step knows only when and how to execute its core functionality based on the subscribed events. While the orchestration pattern provides a unified view of the pipeline, it’s tightly coupled and less flexible to scale.

On the other hand, the choreography pattern is loosely coupled and very easy to scale. However, the shared-nothing pipeline can be difficult to monitor and reprocess failed steps.

With the breadth of AWS services, a hybrid approach can be implemented to take advantage of both patterns. The choreography pattern can be used to execute the processing logic and the orchestration pattern can be implemented with a central event manager — such as Simple Notification Service (SNS) — and an orchestrator (MWAA) to consume all events and persist the execution logs.

Past experience has shown the most advantageous solution architectures are often bespoke and optimized to fit a client’s specific use case.

5. Monitoring, audit and alert. A robust data monitoring and alert framework is vital to a successful risk management program. Because risk data comes from various vendors at different frequencies, keeping track of data quality and receiving hundreds of files monthly can be a daunting task.

With Amazon CloudWatch, SNS and Simple Email Service (SES) a scalable monitoring and alert framework can be set up to seamlessly integrate into the risk data platform — giving risk management professionals the confidence of accessible data.

Amazon CloudWatch collects and aggregates metrics about infrastructure, transactions and applications. When anomalies are detected — like file delay, data quality rule failed, new data elements received — corresponding events are raised. Those anomalies, in turn, trigger the notification services (SNS, SES, etc.) to alert data stewards and vendors in real time.
Amazon SNS is a versatile notification service that follows the pub-sub (publish-subscribe) messaging paradigm such that when CloudWatch raises an event, SNS will publish a notification to its subscribers via email, SMS or even third-party applications such as Slack. This gives program managers the flexibility to choose the most effective communication channels and provide real-time monitoring capability to the risk managers.

Amazon SES, on the other hand, is a fully managed email service that gives risk managers full control over every aspect of an email (e.g., address, subject, body, styles, custom logics) that SNS doesn’t offer. If email is the main notification mechanism, SES is usually a better choice for its customization capability. Note that SES cannot be invoked by CloudWatch directly; a notification lambda is needed to consume the event from CloudWatch and invoke SES.

6. Security. Building effective technology that enhances data collection, interpretation and use requires careful planning and vision. A sound data infrastructure, like AWS Lake Formation, helps to build, secure and manage data in three steps:

Identifies existing data stores in S3 or relational and NoSQL databases, and moves the data into your data lake.
Catalogs and prepares the data for analytics.
Provides users with secure self-service access to the data through their choice of analytics services.

AWS Lake Formation manages the tasks shown in the orange box and is integrated with the data stores and services.

Other AWS services and third-party applications can also access data through the services shown. Even with the proper data architecture in place, risk managers need to perform thorough analysis on the information available to them.

Takeaway

Risk managers often spend significant amounts of time and resources managing data. Implementing a data infrastructure — AWS or other cloud services — independently or in concert with the broader organization’s cloud strategy will give risk managers more time to do what they do best: manage risk.

Look for a detailed examination of risk management analytics in our Next in Series: Analytics

_{¹The tools, capabilities and descriptions provided herein are based on publicly available information on Amazon Web Services (AWS) (see aws.amazon.com)

References:

https://www.pwc.com/us/en/industries/financial-services/insurance/actuarial-finance-modernization.html

https://content.naic.org/cipr-topics/enterprise-risk-management-erm}