VOO transforms its BI services and moves to the Cloud - Article in Solutions Magazine.
VOO is completing Memento, its business intelligence and big data transformation program, which includes a migration to the cloud. Find out how Micropole supported the operator during the different phases of the project.
In the context of a global transformation, Micropole helped VOO implement a complete migration of its Business Intelligence, Big Data and AI landscape to the cloud. This migration was essential to meet the company's strategic and urgent needs:
- Drastically increase customer information to accelerate acquisition and improve loyalty and retention;
- Support digital transformation by providing a unified view of the customer and his behavior;
- Addressing new compliance challenges (RGPD);
- Drastically decrease the total cost of ownership of global data environments (4 different BI environments + 3 Hadoop clusters before transformation) ;
- Implement enterprise-wide data governance and address shadow BI (25+ FTEs in the enterprise for data cleansing and processing).
The solution and the result generated by Micropole
Micropole conducted a quick study, reviewed all aspects of the transformation and identified the organizational (roles and responsibilities, teams and skills, processes, governance) and technical challenges (global architectural scenarios, ranging from hybrid cloud solutions to complete cloud solutions in PaaS mode, or Platform-as-a-Service).
Based on the study's conclusion, Micropole deployed an enterprise-wide cloud-based data platform to combine traditional BI processes with advanced analytics capabilities. Micropole helped redefine the data organization and related processes and introduced enterprise-level data governance.
Total cost of ownership has dropped to less than a third of what it was, while capacity and agility have improved significantly.
Architecture based on AWS key data services
Amazon S3 is used for the central input layer and for long-term storage.
Some data files are pre-processed on Amazon EMR. EMR clusters are created on the fly several times a day. The clusters only process new data that arrives in S3. Once the data is processed and stored in an Apache Parquet format optimized for analysis, the cluster is destroyed. Encryption and lifecycle management are enabled on most S3 clusters to meet security and cost efficiency requirements. Over 600 TB of data is currently stored in the data lake. Amazon Athena is used to create and maintain a data catalog and explore the raw data in the data lake.
Amazon Kinesis Data Streams captures real-time data; this data is filtered and enriched (with data from the data warehouse) by a Lambda function before being stored in an Amazon DynamoDB database. Real-time data is also stored in dedicated S3 buckets for retention.
The data warehouse runs on Amazon Redshift, uses the new RA3 nodes and follows the Data Vault 2.0 methodology. Data Vault objects are highly standardized and have strict modeling rules, which allows for a high level of standardization and automation. The data model is generated from the metadata stored in an Amazon RDS Aurora database.
The automation engine itself is built on Apache Airflow, deployed on EC2 instances.
The project implementation began in June 2017; the production Redshift cluster initially scaled to 6 DC2 nodes has seamlessly evolved over time to meet the growing data needs of projects and overall business requirements.
Amazon DynamoDB is used for specific use cases where web applications require sub-second response times. Using DynamoDB's variable read/write capacity allows the more expensive high-performance read capacity to be provisioned only during business hours when low latency and fast response times are required. These mechanisms, which rely on the elasticity of AWS services, are used to optimize the monthly AWS bill.
A series of predictive models were implemented, ranging from a classic churn prediction model to more advanced use cases. For example, a model was built to identify customers who are likely to have been affected by a network outage. Amazon SageMaker was used to build, train, and deploy the models at scale, leveraging data available in the Data Lake (Amazon S3) and Data Warehouse (Amazon Redshift).
API for external access
External parties need to access specific datasets securely and reliably and Amazon API Gateway is used to deploy secure RESTful APIs on top of serverless data microservices implemented with Lambda functions.
And much more!
The data platform that Micropole has built for VOO offers dozens of other possibilities. The wide range of services available in the AWS environment means that new use cases can be handled quickly and efficiently every day.