Cloud Migration Resources | Unravel Data https://www.unraveldata.com/resources/cloud-migration/ Fri, 16 May 2025 15:11:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 Unravel Data Cloud Migration and Management Datasheet https://www.unraveldata.com/resources/unravel-data-cloud-migration-datasheet/ https://www.unraveldata.com/resources/unravel-data-cloud-migration-datasheet/#respond Tue, 12 Jan 2021 16:29:03 +0000 https://www.unraveldata.com/?p=5752

Thank you for your interest in the Unravel Data Cloud Migration and Management Datasheet. You can download it here.

The post Unravel Data Cloud Migration and Management Datasheet appeared first on Unravel.

]]>

Thank you for your interest in the Unravel Data Cloud Migration and Management Datasheet.

You can download it here.

The post Unravel Data Cloud Migration and Management Datasheet appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/unravel-data-cloud-migration-datasheet/feed/ 0
Roundtable Recap: DataOps Just Wanna Have Fun https://www.unraveldata.com/resources/roundtable-recap-dataops-just-wanna-have-fun/ https://www.unraveldata.com/resources/roundtable-recap-dataops-just-wanna-have-fun/#respond Fri, 06 May 2022 12:04:42 +0000 https://www.unraveldata.com/?p=9378 Sparkler Abstract Background

We like to keep things light at Unravel. In a recent event, we hosted a group of industry experts for a night of laughs and drinks as we discussed cloud migration and heard from our friends […]

The post Roundtable Recap: DataOps Just Wanna Have Fun appeared first on Unravel.

]]>
Sparkler Abstract Background

We like to keep things light at Unravel. In a recent event, we hosted a group of industry experts for a night of laughs and drinks as we discussed cloud migration and heard from our friends at Don’t Tell Comedy.

Unravel VP of Solutions Engineering Chris Santiago and AWS Sr. Worldwide Business Development Manager for Analytics Kiran Guduguntla moderated a discussion with data professionals from Black Knight, TJX Companies, AT&T Systems, Georgia Pacific, and IBM, among others.

This post briefly recaps that discussion.

The cloud journey

To start, Chris asked attendees where they were in their cloud journey. The top responses were tied, with hybrid cloud and on-prem being the most popular responses. Following that were cloud-native, cloud-enabled, and multi-cloud workloads.

Kiran, who focuses primarily on migrations to EMR, responded to these results noting that he wasn’t surprised. The industry has seen significant churn in the past few years, especially people moving from Hadoop. As clients move to the cloud, EMR continues to lead the pack as a top choice for users.

Migration goals

As a follow-up question, we conducted a second poll to learn about the migration goals of our attendees. Are they prioritizing cost optimization? Seeking greater visibility? Boosting performance? Or are they looking for ways to better automate and decrease time to resolution?

Unsurprisingly, the number one migration goal was reducing and optimizing resource usage. Cost is king. Kiran explained the results of an IDC study that followed users as they migrated their workloads from on-premises environments to EMR. The study found that customers saved about 57%, and the ROI over five years rose to 350%.

He emphasized that cost isn’t the only benefit of migration from on-prem to the cloud. The shift allows for better management, reduced administration, and better performance. Customers can run their workloads two or three times faster because of the optimization included in EMR frameworks.

Thinking about migrating to AWS? Start with Unravel
Discover why

Data security and privacy in the cloud

One attendee brought the conversation along by bringing up the questions many clients are asking: How can I be sure of data security? Their priority is meeting regulatory compliance and taking every step to ensure they aren’t hacked. The main concern is not how to use the cloud, but how to secure the cloud.

Kiran agreed, emphasizing that security is paramount at AWS. He explained the security features AWS implements to promote data security:

1. Data encryption

AWS encrypts data either while in S3 or as it’s in motion to S3.

2. Access control

Providing fine-grain access control using Lake Formation combined with robust audit controls limits data access.<

3. Compliance

AWS meets every major compliance requirement, including GDPR.

He continued, noting that making these features available is good, but it is essential to architect them to meet the user’s or clients’ particular requirements.

 Interested in learning more about Unravel for EMR migrations? Start here.

The post Roundtable Recap: DataOps Just Wanna Have Fun appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/roundtable-recap-dataops-just-wanna-have-fun/feed/ 0
Webinar Recap: Functional strategies for migrating from Hadoop to AWS https://www.unraveldata.com/resources/webinar-recap-functional-strategies-for-migrating-from-hadoop-to-aws/ https://www.unraveldata.com/resources/webinar-recap-functional-strategies-for-migrating-from-hadoop-to-aws/#respond Thu, 21 Apr 2022 19:55:25 +0000 https://www.unraveldata.com/?p=9201

In a recent webinar, Functional (& Funny) Strategies for Modern Data Architecture, we combined comedy and practical strategies for migrating from Hadoop to AWS.  Unravel Co-Founder and CTO Shivnath Babu moderated a discussion with AWS Principal […]

The post Webinar Recap: Functional strategies for migrating from Hadoop to AWS appeared first on Unravel.

]]>

In a recent webinar, Functional (& Funny) Strategies for Modern Data Architecture, we combined comedy and practical strategies for migrating from Hadoop to AWS. 

Unravel Co-Founder and CTO Shivnath Babu moderated a discussion with AWS Principal Architect, Global Specialty Practice, Dipankar Ghosal and WANdisco CTO Paul Scott-Murphy. Here are some of the key takeaways from the event.

Hadoop migration challenges

Business computing workloads are moving to the cloud en masse to achieve greater business agility, access to modern technology, and reduce operational costs. But identifying what you have running, understanding how it all works together, and mapping it to a cloud topology is extremely difficult when you have hundreds, if not thousands, of data pipelines and are dealing with tens of thousands of data sets.

Hadoop-to-AWS migration poll

We asked attendees what their top challenges have been during cloud migration and how they would describe the current state of their cloud journey. Not surprisingly, the complexity of their environment was the #1 challenge (71%), followed by the “talent gap” (finding people with the right skills).

The unfortunate truth is that most cloud migrations run over time and over budget. However, when done right, moving to the cloud can realize spectacular results.

How AWS approaches migration

Dipankar talked about translating a data strategy into a data architecture, emphasizing that the data architecture must be forward-looking: it must be able to scale in terms of both size and complexity, with the flexibility to accommodate polyglot access. That’s why AWS does not recommend a one-size-fits-all approach, as it eventually leads to compromise. With that in mind, he talked about different strategies for migration.

Hadoop to AWS reference architecture

Hadoop to Amazon EMR Migration Reference Architecture

He recommends a data-first strategy for complex environments where it’s a challenge to find the right owners to define why the system is in place. Plus, he said, “At the same time, it gives the business the data availability on the cloud, so that they can start consuming the data right away.”

The other approach is a workload-first strategy, which is favored when migrating a relatively specialized part of the business that needs to be refactored (e.g., Pig to Spark).

He wrapped up with a process with different “swim lanes where every persona has skin in the game for a migration to be successful.”

Why a data-first strategy?

Paul followed up with a deeper dive into a data-first strategy. Specifically, he pointed out that in a cloud migration, “people are unfamiliar with what it takes to move their data at scale to the cloud. They’re typically doing this for the first time, it’s a novel experience for them. So traditional approaches to copying data, moving data, or planning a migration between environments may not be applicable.” The traditional lift-and-shift application-first approach is not well suited to the type of architecture in a big data migration to the cloud.

a data-first approach to cloud migration

Paul said that the WANdisco data-first strategy looks at things from three perspectives:

  • Performance: Obviously moving data to the cloud faster is important so you can start taking advantage of a platform like AWS sooner. You need technology that supports the migration of large-scale data and allows you to continue to use it while migration is under way. There cannot be any downtime or business interruption. 
  • Predictability: You need to be able to determine when data migration is complete and plan for workload migration around it.
  • Automation: Make the data migration as straightforward and simple as possible, to give you faster time to insight, to give you the flexibility required to migrate your workloads efficiently, and to optimize workloads effectively.

How Unravel helps before, during, and after migration

Shivnath went through the pitfalls encountered at each stage of a typical migration to AWS (assess; plan; test/fix/verify; optimize, manage, scale). He pointed out that it all starts with careful and accurate planning, then continuous optimization to make sure things don’t go off the rails as more and more workloads migrate over.

Unravel helps at each stage of cloud migration

And to plan properly, you need to assess what is a very complex environment. All too often, this is a highly manual, expensive, and error-filled exercise. Unravel’s full-stack observability collects, correlates, and contextualizes everything that’s running in your Hadoop environment, including identifying all the dependencies for each application and pipeline.

Then once you have this complete application catalog, with baselines to compare against after workloads move to the cloud, Unravel generates a wave plan for migration. Having such accurate and complete data-based information is crucial to formulating your plan. Usually when migrations go off schedule and over budget, it’s because the original plan itself was inaccurate.

Then after workloads migrate, Unravel provides deep insights into the correctness, performance, and cost. Performance inefficiencies and over-provisioned resources are identified automatically, with AI-driven recommendations on exactly what to fix and how.

As more workloads migrate, Unravel empowers you to apply policy-based governance and automated alerts and remediation so you can manage, troubleshoot, and optimize at scale.

Case study: GoDaddy

The Unravel-WANdisco-Amazon partnership has proven success in migrating a Hadoop environment to EMR. GoDaddy had to move petabytes of actively changing “live” data when the business depends on the continued operation of applications in the cluster and access to its data. They had to move an 800-node Hadoop cluster with 2.5PB of customer data that was growing by more than 4TB every day. The initial (pre-Unravel) manual assessment took several weeks and proved incomplete: only 300 scripts were discovered, whereas Unravel identified over 800.

GoDaddy estimated that its lift-and-shift migration operating costs would cost $7 million, but Unravel AI optimization capabilities identified savings that brought down the cloud costs to $2.9 million. Using WANdisco’s data-first strategy, GoDaddy was able to complete its migration process on time and under budget while maintaining normal business operations at all times.

Q&A

The webinar wrapped up with a lively Q&A session where attendees asked questions such as:

  • We’re moving from on-premises Oracle to AWS. What would be the best strategy?
  • What kind of help can AWS provide in making data migration decisions?
  • What is your DataOps strategy for cloud migration?
  • How do you handle governance in the cloud vs. on-premises?

To hear our experts’ specific individual responses to these questions as well as the full presentation, click here to get the webinar on demand (no form to fill out!)

 

The post Webinar Recap: Functional strategies for migrating from Hadoop to AWS appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/webinar-recap-functional-strategies-for-migrating-from-hadoop-to-aws/feed/ 0
Webinar Recap: Optimizing and Migrating Hadoop to Azure Databricks https://www.unraveldata.com/resources/webinar-recap-optimizing-and-migrating-hadoop-to-azure-databricks/ https://www.unraveldata.com/resources/webinar-recap-optimizing-and-migrating-hadoop-to-azure-databricks/#respond Mon, 18 Apr 2022 20:32:46 +0000 https://www.unraveldata.com/?p=9180

The benefits of moving your on-prem Spark Hadoop environment to Databricks are undeniable. A recent Forrester Total Economic Impact (TEI) study reveals that deploying Databricks can pay for itself in less than six months, with a […]

The post Webinar Recap: Optimizing and Migrating Hadoop to Azure Databricks appeared first on Unravel.

]]>

The benefits of moving your on-prem Spark Hadoop environment to Databricks are undeniable. A recent Forrester Total Economic Impact (TEI) study reveals that deploying Databricks can pay for itself in less than six months, with a 417% ROI from cost savings and increased revenue & productivity over three years. But without the right methodology and tools, such modernization/migration can be a daunting task.

Capgemini’s VP of Analytics Pratim Das recently moderated a webinar with Unravel’s VP of Solutions Engineering Chris Santiago, Databricks’ Migrations Lead (EMEA) Amine Benhamza, and Microsoft’s Analytics Global Black Belt (EMEA) Imre Ruskal to discuss how to reduce the risk of unexpected complexities, avoid roadblocks, and present cost overruns. 

The session Optimizing and Migrating Hadoop to Azure Databricks is available on demand, and this post briefly recaps that presentation.

Pratim from Capgemini opened by reviewing the four phases of a cloud migration—assess; plan; test, fix, verify; optimize, manage, scale—and polling the attendees about where they were on their journey and the top challenges they have encountered. 

Migrating Hadoop to Databricks poll question

How Unravel helps migrate to Databricks from Hadoop

Chris ran through the stages an enterprise goes through when doing a cloud migration from Hadoop to Databricks (really, any cloud platform), with the different challenges associated with each phase. 

4 stages of cloud migration

Specifically, profiling exactly what you have running on Hadoop can be a highly manual, time-consuming exercise than can take 4-6 months, requires domain experts, can cost over $500K—and even then is still usually inaccurate and incomplete by 30%.

This leads to problematic planning. Because you don’t have complete data and have missed crucial dependencies, you wind up with inaccurate “guesstimates” that delay migrations by 9-18 months and underestimate TCO by 3-5X

Then once you’ve actually started deploying workloads in the cloud, too often users are frustrated that workloads are running slower than they did on-prem. Manual tuning each job takes about 20 hours in order to meet SLAs, increasing migration expenses by a few million dollars. 

Finally, migration is never a one-and-done deal. Managing and optimizing the workloads is a constant exercise, but fragmented tooling leads to cumbersome manual management and lack of governance results in ballooning cloud costs.

how Unravel helps cloud migration assessments

Chris Santiago shows over a dozen screenshots illustrating Unravel capabilities to assess and plan a Databricks migration. Click on image or here to jump to his session.

Chris illustrated how Unravel’s data-driven approach to migrating to Azure Databricks helps alleviate and solve these challenges. Specifically, Unravel answers questions you need to ask to get a complete picture of your Hadoop inventory:

  • What jobs are running in your environment—by application, by user, by queue? 
  • How are your resources actually being utilized over a lifespan of a particular environment?
  • What’s the velocity—the number of jobs that are submitted in a particular environment—how much Spark vs. Hive, etc.?
  • What pipelines are running (think Airflow, Oozie)?
  • Which data tables are actually being used, and how often? 

Then once you have a full understanding of what you’re running in the Hadoop environment, you can start forecasting what this would look like in Databricks. Unravel gathers all the information about what resources are actually being used, how many, and when for each job. This allows you to “slice” the cluster to start scoping out what this would look like from an architectural perspective. Unravel takes in all those resource constraints and provides AI-driven recommendations on the appropriate architecture: when and where to use auto-scaling, where spot instances could be leveraged, etc.

See the entire presentation on migrating from Hadoop to Azure Databricks
Watch webinar

Then when planning, Unravel gives you a full application catalog, both at a summary and drill-down level, of what’s running either as repeated jobs or ad hoc. You also get complexity analysis and data dependency reports so you know what you need to migrate and when in your wave plan. This automated report takes into account the complexity of your jobs, the data level and app level dependencies, and ultimately spits out a sprint plan that gives you the level of effort required. 

Unravel AI recommendations

Click on image or here to see Unravel’s AI recommendations in action

But Unravel also helps with monitoring and optimizing your Databricks environment post-deployment to make sure that (a) everyone is using Databricks most effectively and (b) you’re getting the most out of your investment. With Unravel, you get full-stack observability metrics to understand exactly what’s going on with your jobs. But Unravel goes “beyond observability” to not just tell you what’s going and why, but also tell you what to do about it. 

By collecting and contextualizing data from a bunch of different sources—logs, Spark UI, Databricks console, APIs—Unravel’s AI engine automatically identifies where jobs could be tuned to run for higher performance or lower cost, with pinpoint recommendations on how to “fix things for greater efficiency. This allows you to tune thousands of jobs on the fly, control costs proactively, and track actual vs. budgeted spend in real time. 

Why Databricks?

Amine then presented a concise summary of why he’s seen customers migrate to Databricks from Hadoop, recounting the high costs associated with Hadoop on-prem, the administrative complexity of managing the “zoo of technologies,” the need to decouple compute and storage to reduce waste of unused resources, the need to develop modern AI/ML use cases, not to mention the Cloudera end-of-life issue. He went on to illustrate the advantages and benefits of the Databricks data lakehouse platform, Delta Lake, and how by bringing together the best of Databricks and Azure into a single unified solution, you get a fully modern analytics and AI architecture.

Databrick lakehose

He then went on to show how the kind of data-driven approach that Capgemnini and Unravel take might look for different technologies migrating from Hadoop to Databricks.

Hadoop to Databricks complexity, ranked

Hadoop migration beyond Databricks

The Hadoop ecosystem over time has become extremely complicated and fragmented. If you are looking at all the components that might be in your Hortonworks or Cloudera legacy distribution today, and are trying to map them to the Azure model analytics reference architecture layer, things get pretty complex.

complex Hadoop environment

Some things are relatively straightforward to migrate over to Databricks—Spark, HDFS, Hive—others, not so much. This is where his team at Azure Data Services can help out. He went through the considerations and mapping for a range of different components, including:

  • Oozie
  • Kudi
  • Nifi
  • Flume
  • Kafka
  • Storm
  • Flink
  • Solr
  • Pig
  • HBase
  • MapReduce
  • and more

He showed how these various services were used to make sure customers are covered, to fill in the gaps and complement Databricks for an end-to- end solution.

mapping Hadoop[ to Azure

Check out the full webinar Optimizing and Migrating Hadoop to Azure Databricks on demand.
No form to fill out!

The post Webinar Recap: Optimizing and Migrating Hadoop to Azure Databricks appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/webinar-recap-optimizing-and-migrating-hadoop-to-azure-databricks/feed/ 0
What’s New in Amazon EMR Unveiled at DataOps Unleashed 2022 https://www.unraveldata.com/resources/whats-new-in-amazon-emr-unveiled-at-dataops-unleashed-2022/ https://www.unraveldata.com/resources/whats-new-in-amazon-emr-unveiled-at-dataops-unleashed-2022/#respond Fri, 08 Apr 2022 15:51:36 +0000 https://www.unraveldata.com/?p=9125 Geometric Design Pattner Background

At the DataOps Unleashed 2022 virtual conference, AWS Principal Solutions Architect Angelo Carvalho presented How AWS & Unravel help customers modernize their Big Data workloads with Amazon EMR. The full session recording is available on demand, […]

The post What’s New in Amazon EMR Unveiled at DataOps Unleashed 2022 appeared first on Unravel.

]]>
Geometric Design Pattner Background

At the DataOps Unleashed 2022 virtual conference, AWS Principal Solutions Architect Angelo Carvalho presented How AWS & Unravel help customers modernize their Big Data workloads with Amazon EMR. The full session recording is available on demand, but here are some of the highlights.

Angelo opened his session with a quick recap of some of the trends and challenges in big data today: the ever increasing size and scale of data; the variety of sources and stores and silos; people of different skill sets needing to access this data easily balanced against the need for security, privacy, and compliance; the expertise challenge in managing open source projects; and, of course, cost considerations.

He went on to give an overview of how Amazon EMR makes it easy to process petabyte-scale data using the latest open source frameworks such as Spark, Hive, Presto, Trino, HBase, Hudi, and Flink. But the lion’s share of his session delved into what’s new in Amazon EMR within the areas of cost and performance, ease of use, transactional data lakes, and security; the different EMR deployment options; and the EMR Migration Program.

What’s new in Amazon EMR?

Cost and performance

EMR takes advantage of the new Amazon Graviton2 instances to provide differentiated performance at lower cost—up to 30% better price-performance. Angelo presented some compelling statistics:

  • Up to 3X faster performance than standard Apache Spark at 40% of the cost
  • Up to 2.6X faster performance than open-source Preston at 80% of the cost
  • 11.5% average performance improvement with Graviton2
  • 25.7% average cost reduction with Graviton2

And you can realize these improvements out of the box while still remaining 100% compliant with open-source APIs. 

Ease of use

EMR Studio now supports Presto. EMR Studio is a fully managed integrated development environment (IDE) based on Jupyter notebooks that makes it easy for data scientists and data engineers to develop, visualize, and debug applications on an EMR cluster without having to log into the AWS console. So basically, you can attach and detach notebooks to and from the clusters using a single click at any time. 

Transactional data lakes

Amazon EMR has supported Apache Hudi for some time to enable transactional data lakes, but now it has added support for Spark SQL and Apache Iceberg. Iceberg is a high-performance format for huge analytic tables at massive scale. Created by Netflix and Apple, it brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, and Hive to work safely in the same tables at the same time.

Security

Amazon EMR has a comprehensive set of security features and functions, including isolation, authentication, authorization, encryption, and auditing. The latest version adds user execution role authorizations, as well as fine-grained access controls (FGAC) using AWS Lake Formation, and auditing using Lake Formation via AWS CloudTrail.

Amazon EMR deployment options

Different options for deploying Amazon EMR

Options for deploying EMR

There are multiple options for deploying Amazon EMR:

  • Deployment on Amazon EC2 allows customers to choose instances that offer optimal price and performance ratios for specific workloads.
  • Deployment on AWS Outposts allows customers to manage and scale Amazon EMR in on-premises environments, just as they would in the cloud.
  • Deployment on containers on top of Amazon Elastic Kubernetes Service (EKS). But note that at this time, Spark is the only big data framework supported by EMR on EKS.
  • Amazon EMR Serverless is a new function that lets customers run petabyte-scale data analytics in the cloud without having to manage or operate server clusters.
See how Unravel helps optimize Amazon EMR
Create a free account

Using Amazon’s EMR migration program

The EMR migration program was launched to help customers streamline their migration and answer questions like, How do I move this massive data set to EMR? What will my TCO look like if I move to EMR? How do we implement security requirements? 

EMR Migration Program outcomes

Amazon EMR Migration program outcomes

Taking a data-driven approach to determine the optimal migration strategy, the Amazon EMR Migration Program (EMP) consists of three main steps:

1. Assessing the migration process begins with creating an initial TCO report, conducting discovery meetings, and using Unravel to quickly discover everything about the data estate. 

2. The mobilization stage involves delivering an assessment insights summary, qualifying for incentives, and developing a migration readiness plan.

3. The migration stage itself includes executing the lift-and shift-migration of applications and data, before modernizing the migrated applications.

Amazon relies on Unravel to perform a comprehensive AI-powered cloud migration assessment. As Angelo explained, “We partner with Unravel Data to take a faster, more data-driven approach to migration planning. We collect utilization data for about two to four weeks depending on the size of the cluster and the complexity of the workloads. 

“During this phase, we are looking to get a summary of all the applications running on the on-premises environment, which provides a breakdown of all workloads and jobs in the customer environment. We identify killed or failed jobs—applications that fail due to resource contention and or lack of resources—and bursty applications or pipelines. 

“For example, we would locate bursty apps to move to EMR, where they can have sufficient resources every time those jobs are run, in a cost-effective way via auto-scaling. We can also estimate migration complexity and effort required to move applications automatically. And lastly, we can identify tools suited for separate clusters. For example, if we identify long-running batch jobs that run at specific intervals, they might be good candidates for spinning a transient cluster only for that job.”

Unravel is equally valuable during and after migration. Its AI-powered recommendations for optimizing applications simplifies tuning and its full-stack insights accelerate troubleshooting.

GoDaddy results with Amazon EMR and Unravel

To illustrate, Angelo concluded with an Amazon EMR-Unravel success story: GoDaddy was moving 900 scripts to Amazon EMR, and each one had to be optimized for performance and cost in a long, time-consuming manual process. But with Unravel’s automated optimization for EMR, they spent 99% less time tuning jobs—from 10+ hours to 8 minutes—saving 2700 hours of data engineering time. Performance improved by up to 72%, and GoDaddy realized $650,000 savings in resource usage costs.

See the entire DataOps Unleashed session on demand
Watch on demand

The post What’s New in Amazon EMR Unveiled at DataOps Unleashed 2022 appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/whats-new-in-amazon-emr-unveiled-at-dataops-unleashed-2022/feed/ 0
From Data to Value: Building a Scalable Data Platform with Google Cloud https://www.unraveldata.com/resources/from-data-to-value-building-a-scalable-data-platform-with-google-cloud/ https://www.unraveldata.com/resources/from-data-to-value-building-a-scalable-data-platform-with-google-cloud/#respond Fri, 25 Mar 2022 19:50:21 +0000 https://www.unraveldata.com/?p=8766 Mesh Sphere Background

At the DataOps Unleashed 2022 conference, Google Cloud’s Head of Product Management for Open Source Data Analytics Abhishek Kashyap discussed how businesses are using Google Cloud to build secure and scalable data platforms. This article summarizes […]

The post From Data to Value: Building a Scalable Data Platform with Google Cloud appeared first on Unravel.

]]>
Mesh Sphere Background

At the DataOps Unleashed 2022 conference, Google Cloud’s Head of Product Management for Open Source Data Analytics Abhishek Kashyap discussed how businesses are using Google Cloud to build secure and scalable data platforms. This article summarizes key takeaways from his presentation, Building a Scalable Data Platform with Google Cloud.

Data is the fuel that drives informed decision-making and digital transformation. Yet major challenges remain when it comes to tackling the demands of data at scale.

The world’s datasphere reached 33 zettabytes in 2018. IDC estimates that it will grow more than fivefold by 2025 to reach 175 zettabytes.

size of datasphere by 2025

IDC estimates the datasphere to grow to 175ZB by 2025.

From data to value: A modern journey

“Building a [modern] data platform requires moving from very expensive traditional [on-prem] storage to inexpensive managed cloud storage because you’re likely going to get to petabytes—hundreds of petabytes of data—fairly soon. And you just cannot afford it with traditional storage,” says Abhishek.

He adds that you need real-time data processing for many jobs. To tie all the systems together—from applications to databases to analytics to machine learning to BI—you need “an open and flexible universal system. You cannot have systems which do not talk to each other, which do not share metadata, and then expect that you’ll be able to build something that scalable,” he adds.

Further, you need a governed and scalable data sharing systems, as well as build machine learning into your pipelines, your processes, your analytics platform. “If you consider your data science team as a separate team that does special projects and needs to download data into their own cluster of VMs, it’s not going to work out,” says Abhishek. 

And finally, he notes, “You have to present this in an integrated manner to all your users to platforms that they use and to user interfaces they are familiar with.”

traditional vs modern data platforms

The benefits of a modern data platform vs. traditional on-prem

The three tenets of a modern data platform

Abhishek outlined three key tenets of a modern data platform:

  • Scalability
  • Security and governance
  • End-to-end integration

And showed how Google Cloud Platform addresses each of these challenges.

Scalability

Abhishek cited an example of a social media company that has to process more than a trillion messages each day. It has over 300 petabytes of data stored, mostly in BigQuery, and is using more than half a million compute cores. This scale is achievable because of two things: the separation of compute and storage, and being serverless. 

The Google Cloud Platform divorces storage from compute

By segregating storage from compute, “you can run your processing and analytics engine of choice that will scale independently of the storage,” explains Abhishek. Google offers a low-cost object store as well as an optimized analytical store for data warehousing workloads, BigQuery. You can run Spark and Dataproc, Dataflow for streaming, BigQuery SQL for your ETL pipelines, Data Fusion, your machine learning models on Cloud AI Platform—all without tying any of that to the underlying data so you can scale the two separately. Google has multiple customers who have run queries on 100,000,000,000,000 (one hundred trillion) rows on BigQuery, with one running 10,000 concurrent queries.

“Talk to any DBA or data engineer who’s working on a self-managed Hadoop cluster, or data warehouse on prem, and they’ll tell you how much time they spent in thinking about virtual machines resource provisioning, fine tuning, putting in purchase orders, because they’ll need to scale six months from now, etc, etc. All that needs to go away,” says Abhishek. He says if users want to use the data warehouse, all they think about is SQL queries. If they use Spark, all they think about is code. 

Google has serverless products for each step of the data pipeline. To ingest and distribute data reliably, there’s the serverless auto-scaling messaging queue Pub/Sub, Dataproc for serverless Spark, Dataflow for serverless Beam, and BigQuery (the first serverless data warehouse) for analysis. 

See the entire presentation on Building a Scalable Data Platform with Google Cloud
Watch presentation on demand

Security and governance

With data at this scale, with users spread across business units and data spread across a variety of sources, security and governance must be automated. You can’t be manually filing tickets for every access request or manually audit everything—that just doesn’t scale. 

Google has a product called Dataplex, which essentially allows you to take a logical view of all your data spread across data warehouses, data lakes, data marts and build a centralized governed set of data lakes whose life cycle you can manage. Let’s say you have structured, semi-structured, and streaming data stored in a variety of places and you have analytics happening through BigQuery SQL or through Spark or Dataflow. Dataplex provides a layer in the middle that allows you to set up automatic data discovery, harvesting metadata, doing things like reading a file from objects to provide it as a table in BigQuery, metadata where it’s going, to ensure data quality.

So you store your data where your applications need it, split your applications from the data, and have Dataplex manage security, governance, and lifecycle management for these data lakes. 

End-to-end integration

Effective data analytics ultimately serve to make data-driven insights available to anyone in the business who needs them. End-to-end integration with the modern data stack is crucial so that the platform accommodates the various tools that different teams are using—not the other way around.

The Google Cloud Platform does this by enhancing the capabilities of the enterprise data warehouse—specifically, BigQuery. BigQuery consolidates data from a variety of places to make it available for analytics in a secure way through the AI Platform, Dataproc for Spark jobs, Dataflow for streaming, Data Fusion for code-free ETL. All BI platforms work with it, and there’s a natural language interface so citizen data scientists can work with it. 

This allows end users to all work through the same platform, but doing their work through languages and interfaces of their choice.

Google Cloud Platform for big data

Google offering for each stage of the data pipeline

Want AI-enabled full-stack observability for GCP?
Request a free account

Wrapping up

Abhishek concludes with an overview of the products that Google offers for each stage of the data pipeline. “You might think that there are too many products,” Abhishek says. “But if you go to any customer that’s even medium size, what you’ll find is different groups have different needs, they work through different tools. You might find one group that loves Spark, you might find another group that needs to use Beam, or third groups wants to use SQL. So to really help this customer build an end-to-end integrated data platform, we provide this fully integrated, very sophisticated set of products that has an interface for each of the users who needs to use this for the applications that they build.”

See how Unravel takes the risk out of managing your Google Cloud Platform migrations.

The post From Data to Value: Building a Scalable Data Platform with Google Cloud appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/from-data-to-value-building-a-scalable-data-platform-with-google-cloud/feed/ 0
The New Cloud Migration Playbook: Strategies for Data Cloud Migration and Management https://www.unraveldata.com/resources/the-new-cloud-migration-playbook-strategies-for-data-cloud-migration-and-management/ https://www.unraveldata.com/resources/the-new-cloud-migration-playbook-strategies-for-data-cloud-migration-and-management/#respond Wed, 09 Mar 2022 17:53:33 +0000 https://www.unraveldata.com/?p=8648

Experts from Microsoft, WANdisco, and Unravel Data recently outlined a step-by-step playbook—utilizing a data-driven approach—for migrating and managing data applications in the cloud. Unravel Data Co-Founder and CTO Shivnath Babu moderated a discussion with Microsoft Chief […]

The post The New Cloud Migration Playbook: Strategies for Data Cloud Migration and Management appeared first on Unravel.

]]>

new cloud migration playbook gif

Experts from Microsoft, WANdisco, and Unravel Data recently outlined a step-by-step playbook—utilizing a data-driven approach—for migrating and managing data applications in the cloud.

Unravel Data Co-Founder and CTO Shivnath Babu moderated a discussion with Microsoft Chief Architect Amit Agrawal and WANdisco CTO Paul Scottt-Murphy on how Microsoft, WANdisco, and Unravel complement each other is accelerating the migration of complex data applications like Hadoop to cloud environments like Azure.

Shivnath is an industry leader in making large-scale data platforms and distributed systems easy to manage. At Microsoft, Amit is responsible for driving cross-cloud solutions and is an expert in Hadoop migrations. Paul spearheads WANdisco’s “data-first strategy,” especially in how to migrate fast-changing and rapidly growing datasets to the cloud in a seamless fashion.

We’re all seeing how data workloads are moving to the cloud en masse. There are several reasons, probably higher business agility topping the list. Plus you get access to the latest and greatest modern software to power AI/ML initiatives. If you optimize properly, the cloud can be hugely cost-beneficial. And then there’s the issue of Cloudera stopping support for its older on-prem software.

But getting to the cloud is not easy. Time and time again, we hear how companies’ migration to the cloud goes over budget and behind schedule. Most enterprises have complex on-prem environments, which are hard to understand. Mapping them to the best cloud topology is even harder—especially when there isn’t enough talent available with the expertise to migrate workloads correctly. All too often, migration involves a lot of time-consuming, error-prone manual effort. And one of the bigger challenges is simply the lack of prioritization from the business.

During the webinar, the audience was asked, “What challenges have you faced during your cloud migration journey?” Nearly two-thirds responded with “complexity of my current environment,” followed by “finding people with the right skills.”

cloud migration poll

These challenges are exactly why Microsoft, WANdisco, and Unravel have created a framework to help you accelerate your migration to Azure.

See the entire presentation on new strategies for data cloud migration & management
Watch webinar

Hadoop Migration Framework

As enterprises build up their data environment over the years, it becomes increasingly difficult to understand what jobs are running and whether they’re optimized—especially for cost, as there’s a growing recognition that jobs which are not optimized can lead to cloud costs spiraling out of control very quickly—and then tie migration efforts to business priorities.

Azure Hadoop Migration Framework

The Microsoft Azure framework for migrating complex on-prem data workloads to the cloud

Amit laid out the framework that Azure uses when first talking to customers about migration. It starts with an engagement discovery that basically “meets the customer where they are” to identify the starting point for the migration journey. Then they take a data-first approach by looking at four key areas:

  • Environment mapping: This is where Unravel really helps by telling you exactly what kind of services you’re running, where they’re running, which queues they belong to, etc. This translates into a map of how to migrate into Azure, so you have the right TCO from the start. You have a blueprint of what things will look like in Azure and a step-by-step process of where to migrate your workloads.
  • Application migration: With a one-to-one map of on-prem applications to cloud applications in place, Microsoft can give customers a clear sense of how, say, an end-of-life on-prem application can be retired in lieu of a modern application.
  • Workload assessment: Usually customers want to “test the waters” by migrating over one or two workloads to Azure. They are naturally concerned about what their business-critical applications will look like in the cloud. So Microsoft does an end-to-end assessment of the workload to see where it fits in, what needs to be done, and thereby give both the business stakeholder and IT the peace of mind that their processes will not break during migration.
  • Data migration: This is where WANdisco can be very powerful, with your current data estate migrated or replicated over to Azure and your data scientists starting to work on creating more use cases and delivering new insights that drive better business processes or new business streams.

Then, once all this is figured out, they determine what the business priorities are, what the customer goals are. Amit has found that this framework fits any and all customers and helps them realize value very quickly.

WANdisco data-first strategy

Data-first strategy vs. traditional approach to cloud migration

Data-First Strategy

A data-first strategy doesn’t mean you need to move your data before doing anything else, but it does mean that having data available in the cloud is critical to quickly gaining value from your target environment.

Without the data being available in some form, you can’t work with it or take advantage of the capabilities that a cloud environment like Azure offers.

A data-first migration differs from a traditional approach, which tends to be more application-centric—where entire sets of applications and their data in combination need to be moved before a hard cutover to use in the cloud environment.

As Paul Scott-Murphy from WANdisco explains, “That type of approach takes much longer to complete and doesn’t allow you to use data while a migration is under way. It also doesn’t allow you to continue to operate the source environment while you’re conducting a migration. So that may be well suited to smaller types of systems—transactional systems and online processing environments—but it’s typically not very well suited to the sort of information data sets and the work done against them from large-scale data platforms built around a data lake.

“So really what we’re saying here is that the migration of analytics infrastructure from data lake environments on premises like Hadoop and Spark and other distributed environments is extremely well suited to a data-first strategy, and what that strategy means is that your data become available for use in the cloud environment in as short a time as possible, really accelerating the pace with which you can leverage the value and giving you the ability to get faster outcomes from that data.

“The flexibility it offers in terms of automating the migration of your data sets and allowing you to continue operating those systems while the migration is underway is also really critical to the data-first migration approach.”

That’s why WANdisco developed its data-first strategy around four fundamental requirements:

1. The technology must efficiently handle arbitrary volumes of data. Data lakes can span potentially exabyte-scale data systems, and traditional tools for copying static information aren’t going to satisfy the needs of migrating data at scale.
2. You need a modern approach to support frequent and actively changing data. If you have data at scale, it is always changing—you’re ingesting that data set all the time, you’re constantly modifying information in your data lake.
3. You don’t want to suffer downtime in your business systems just for the purpose of migrating to the cloud. Incurring additional costs beyond the effort involved in migration is likely to be unacceptable to the business.
4. You need to validate or have a means of ensuring that your data migrated in full.
With those elements as the technical basis, WANdisco has developed the technology, LiveData Migrator for Azure, to support this approach to data-first migration. The approach that WANdisco technology enables with data-first migration is really central to getting the most out of your migration effort.

Check out the case study of how WANdisco took a data-first approach to migrate a global telecom from on-prem Hadoop to Azure
See case study
Unravel stages of cloud migration

Unravel accelerates every stage of migration

How to Accelerate the Migration Journey

When teaming up with Microsoft and WANdisco, Unravel splits the entire migration into four stages:

1. Assess

This is really about assessing the current environment. Assessing a complex environment can take months—it’s not uncommon to see a 6-month assessment—which gets expensive and is often inaccurate about all the dependencies across all the apps and data and resources and pipelines. Identifying the intricate dependencies is critical to the next phase—we’ve seen how 80% of the time and effort in a migration can be just getting the dependencies right.

2. Plan

This is the most important stage. Because if the plan is flawed, with inaccurate or incomplete data, there is no way your migration will be done on time and within budget. It’s not unusual to see a migration have to be aborted partway through and then restarted.

3. Test, Fix, Verify

Usually the first thing you ask after a workload has migrated is, Is it correct? Are you getting the same or better performance that we got on-prem? And are you getting that performance at the right cost? Again, if you don’t get it right, the entire cost of migration can run significantly over budget.

4. Optimize, Manage, Scale

As more workloads come over to the cloud, scaling becomes a big issue. It can be a challenge to understand the entire environment and its cost, especially around governance. Because if you don’t bring in best practices and set up guardrails from the get-go, it can be very difficult to fix things later on. That can lead to low reliability, or maybe the clusters are auto-scaling and the expenses are much higher than they need to be—actually, a lot of things can go wrong.

Unravel can help with all of these challenges. And it helps by accelerating each stage. How can you accelerate assessment? Unravel automatically generates an X-ray of the complex on-prem environment, capturing all the dependencies and converting that information into a quick total cost of ownership analysis. That can be then used to justify and prioritize the business to actually move to the cloud pretty quickly. Then you can drill down from the high-level cost estimates into where the big bottlenecks are (and why).

This then feeds into the planning, where it’s really all about ensuring that the catalog, or inventory, of apps, data, and resources that has been built—along with all the dependencies that have been captured—is converted into a sprint-by-sprint plan for the migration.

As you go through these sprints, that’s where the testing, fixing, and verifying are done—especially getting a single pane of glass that can show the workloads as they are running on-prem and then the migrated counterparts on the cloud so that correctness, performance, and cost can be easily checked and verified against the baseline.

Then everything gets magnified as more and more apps are brought over. Unravel provides full-stack observability along with the ability to govern and fix problems. But most important, Unravel can help ensure that these problems in terms of performance, correctness, and cost never happen in the first place.

Unravel complements Microsoft and WANdisco along at each step of the new data-first cloud migration playbook.

Check out the entire New Cloud Migration Playbook webinar on demand.

The post The New Cloud Migration Playbook: Strategies for Data Cloud Migration and Management appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/the-new-cloud-migration-playbook-strategies-for-data-cloud-migration-and-management/feed/ 0
Top Cloud Data Migration Challenges and How to Fix Them https://www.unraveldata.com/resources/cloud-migration-challenges-and-fixes/ https://www.unraveldata.com/resources/cloud-migration-challenges-and-fixes/#respond Wed, 02 Feb 2022 18:26:07 +0000 https://www.unraveldata.com/?p=8383 Abstract 3D Tunnel Binary Code

We recently sat down with Sandeep Uttamchandani, Chief Product Officer at Unravel, to discuss the top cloud data migration challenges in 2022. No question, the pace of data pipelines moving to the cloud is accelerating. But […]

The post Top Cloud Data Migration Challenges and How to Fix Them appeared first on Unravel.

]]>
Abstract 3D Tunnel Binary Code

We recently sat down with Sandeep Uttamchandani, Chief Product Officer at Unravel, to discuss the top cloud data migration challenges in 2022. No question, the pace of data pipelines moving to the cloud is accelerating. But as we see more enterprises moving to the cloud, we also hear more stories about how migrations went off the rails. One report says that 90% of CIOs experience failure or disruption of data migration projects due to the complexity of moving from on-prem to the cloud. Here are Dr. Uttamchandani’s observations on the top obstacles to ensuring that data pipeline migrations are successful.

1. Knowing what should migrate to the cloud

The #1 challenge is getting an accurate inventory of all the different workloads that you’re currently running on-prem. The first question to be answered when you want to migrate data applications or pipelines is, Migrate what?

In most organizations, this is a highly manual, time-consuming exercise that depends on tribal knowledge and crowd-sourcing information. It’s basically going to every team running data jobs and asking what they’ve got going. Ideally, there would be a well-structured, single centralized repository where you could automatically discover everything that’s running. But in reality, we’ve never seen this. Virtually every enterprise has thousands of people come and go over the years—all building different dashboards for different reasons, using a wide range of technologies and applications—so things wind up being all over the place. The more brownfield the deployment, the more probable that workloads are siloed and opaque.

It’s important to distinguish between what you have and what you’re actually running. Sandeep recalls one large enterprise migration where they cataloged 1200+ dashboards but discovered that some 40% of them weren’t being used! They were just sitting around—it would be a waste of time to migrate them over. The obvious analogy is moving from one house to another: you accumulate a lot of junk over the years, and there’s no point in bringing it with you to the new place. 

single pane of glass view of data pipelines

Understanding exactly what’s actually running is the first step to cloud migration.

How Unravel helps identify what to migrate

Unravel’s always-on full-stack observability capabilities automatically discover and monitor everything that’s running in your environment, in one single view. You can zero in on what jobs are actually operational, without having to deal with all the clutter of what’s not. If it’s running, Unravel finds it—and learns as it goes. If we apply the 80/20 rule, that means Unravel finds 80% of your workloads immediately because you’re running them now; the other 20% get discovered when they run every month, three months, six months, etc. Keep Unravel on and you’ll automatically always have an up-to-date inventory of what’s running.

How Unravel helps cloud migrations

Create a free account

2. Determining what’s feasible to migrate to the cloud

After you know everything that you’re currently running on-premises, you need to figure out what to migrate. Not everything is appropriate to run in the cloud. But identifying which workloads are feasible to run in the cloud and which are not is no small task. Assessing whether it makes sense to move a workload to the cloud normally requires a series of trial-and-error experimentation. It’s really a “what if” analysis that involves a lot of heavy lifting. You take a chunk of your workload, make a best-guess estimate on the proper configuration, run the workload, and see how it performs (and at what cost). Rinse and repeat, balancing cost and performance.

data pipeline workflow analysis

Look before you leap into the cloud—gather intelligence to make the right choices.

 

How Unravel helps determine what to move

With intelligence gathered from its underlying data model and enriched with experience managing data pipelines at large scale, Unravel can perform this what-if analysis for you. It’s more directionally accurate than any other technique (other than moving things to the cloud and seeing what happens). You’ll get a running head start into your feasibility assessment, reducing your configuration tuning efforts significantly.

3. Defining a migration strategy 

Once you determine whether moving a particular workload is a go/no-go from a cost and business feasibility/agility perspective, the sequence of moving jobs to the cloud—what to migrate when—must be carefully ordered. Workloads are all intertwined, with highly complex interdependencies. 

complex dependencies of modern data pipelines

Source: Sandeep Uttamchandani, Medium: Wrong AI, How We Deal with Data Quality Using Circuit Breakers 

You can’t simply move any job to the cloud randomly, because it may break things. That job or workload may depend on data tables that are potentially sitting back on-prem. The sequencing exercise is all about how to carve out very contained units of data and processing that you can then move to the cloud, one pipeline at a time. You obviously can’t migrate the whole enchilada at once—moving hundreds of petabytes of data in one go is impossible—but understanding which jobs and which tables must migrate together can take months to figure out.

How Unravel helps you define your migration strategy

Unravel gives you a complete understanding of all dependencies in a particular pipeline. It maps dependencies between apps, datasets, users, and resources so you can easily analyze everything you need to migrate to avoid breaking a pipeline.

4. Tuning data workloads for the cloud

Once you have workloads migrated, you need to optimize them for this new (cloud) environment. Nothing works out of the box. Getting everything to work effectively and efficiently in the cloud is a challenge.The same Spark query, the same big data program, that you have written will need to be tuned differently for the cloud. This is a function of complexity: the more complex your workload, the more likely it will need to be tuned specifically for the cloud. This is because the model is fundamentally different. On-prem is a small number of large clusters; the cloud is a large number of small clusters.

This where two different philosophical approaches to cloud migration come into play: lift & shift vs. lift & modernize. While lift & shift simply takes what you have running on-prem and moves it to the cloud, lift & modernize essentially means first rewriting workloads or tuning them so that they are optimized for the cloud before you move them over. It’s like a dirt road that gets damaged with potholes and gullies after a heavy rain. You can patch it up time after time, or you can pave it to get a “new” road. 

But say you have migrated 800 workloads. It would take months to tune all 800 workloads—nobody has the time or people for that. So the key is to prioritize. Tackle the most complex workloads first because that’s where you get “more bang for your buck.” That is, the impact is worth the effort. If you take the least complex query and optimize it, you’ll get only a 5% improvement. But tuning a 60-hour query to run in 6 hours has a huge impact.

AI recommendations for data pipelines

AI-powered recommendations show where applications can be optimized.

How Unravel helps tune data pipelines

Unravel has a complexity report that shows you at a glance which workloads are complex and which are not. Then Unravel helps you tune complex cloud data pipelines in minutes instead of hours (maybe even days or weeks). Because it’s designed specifically for modern data pipelines, you have immediate access to all the observability data about every data job you’re running at your fingertips without having to stitch together logs, metrics, traces, APIs, platform UI information information, etc., manually. 

But Unravel goes a step further. Instead of simply presenting all this information about what’s happening and where—and leaving you to figure out what to do next—Unravel’s “workload-aware” AI engine provides actionable recommendations on specific ways to optimize performance and costs. You get pinpoint how-to about code you can rewrite, configurations you can tweak, and more.

Cloud migration made much, much easier

Create a free account

5. Keeping cloud migration costs under control

The moment you move to the cloud, you’ll start burning a hole in your wallet. Given the on-demand nature of the cloud, most organizations lose control of their costs. When first moving to the cloud, almost everyone spends more than budgeted—sometimes as much as 2X. So you need some sort of good cost governance. One approach to managing costs is to simply turn off workloads when you threaten to run over budget, but this doesn’t really help. At some point, those jobs need to run. Most overspending is due to overprovisioned cloud resources, so a better approach is to configure instances based on actual usage requirements rather than on received need.

How Unravel helps keep cost under control

Because Unravel has granular, job-level intelligence about what resources are actually required to run individual workloads, it can identify where you’ve requested more or larger instances than needed. Its AI recommendations automatically let you know what a more appropriately “right-sized” configuration would be. Other so-called cloud cost management tools can tell you how much you’re spending at an aggregated infrastructure level—which is okay—but only Unravel can pinpoint at a workload level exactly where and when you’re spending more than you have to, and what to do about it.

6. Getting data teams to think differently

Not so much a technology issue but more a people and process matter, training data teams to adopt a new mind-set can actually be a big obstacle. Some of what is very important when running data pipelines on-prem—like scheduling—is less crucial in the cloud. That running one instance for 10 hours is the same as running 10 instances for one hour is a big change in the way people think. On the other hand, some aspects assume greater importance in the cloud—like having to think about the number and size of instances to configure.

Some of this is just growing pains, like moving from a typewriter to a word processor. There was a learning curve in figuring out how to navigate through Word. But over time as people got more comfortable with it, productivity soared. Similarly, training teams to ramp up on running data pipelines in the cloud and overcoming their initial resistance is not something that can be accomplished with a quick fix.

How Unravel helps clients think differently

Unravel helps untangle the complexity of running data pipelines in the cloud. Specifically, with just the click of a button, non-expert users get AI-driven recommendations in plain English about what steps to take to optimize the performance of their jobs, what their instance configurations should look like, how to troubleshoot a failed job, and so forth. It doesn’t require specialized under-the-hood expertise to look at charts and graphs and data dumps to figure out what to do next. Delivering actionable insights automatically makes managing data pipelines in the cloud a lot less intimidating.

Next steps

Check out our 10-step strategy to a successful move to the cloud in Ten Steps to Cloud Migration, or create a free account and see for yourself how Unravel helps simplify the complexity of cloud migration.

The post Top Cloud Data Migration Challenges and How to Fix Them appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/cloud-migration-challenges-and-fixes/feed/ 0
Big Data Meets the Cloud https://www.unraveldata.com/resources/big-data-meets-the-cloud/ https://www.unraveldata.com/resources/big-data-meets-the-cloud/#respond Tue, 21 Dec 2021 15:37:28 +0000 https://www.unraveldata.com/?p=8185 Computer Network Background Abstract

This article by Unravel CEO Kunal Agarwal originally appeared as a Forbes Technology Council post under the title The Future of Big Data Is in the Hybrid Cloud: Part 2 and has been updated to reflect […]

The post Big Data Meets the Cloud appeared first on Unravel.

]]>
Computer Network Background Abstract

This article by Unravel CEO Kunal Agarwal originally appeared as a Forbes Technology Council post under the title The Future of Big Data Is in the Hybrid Cloud: Part 2 and has been updated to reflect 2021 statistics.

With interest in big data and cloud increasing around the same time, it wasn’t long until big data began being deployed in the cloud. Big data comes with some challenges when deployed in traditional, on-premises settings. There’s significant operational complexity, and, worst of all, scaling deployments to meet the continued exponential growth of data is difficult, time-consuming, and costly.

The cloud provides the perfect solution to this problem since it was built for convenience and scalability. In the cloud, you don’t have to tinker around trying to manually configure and troubleshoot complicated open-source technology. When it comes to growing your deployments, you can simply hit a few buttons to instantly roll out more instances of Hadoop, Spark, Kafka, Cloudera, or any other big data app. This saves money and headaches by eliminating the need to physically grow your infrastructure and then service and manage that larger deployment. Moreover, the cloud allows you to roll back these deployments when you don’t really need them—a feature that’s ideal for big data’s elastic computing nature.

Big data’s elastic compute requirements mean that organizations will have a great need to process big data at certain times but little need to process it at other times. Consider the major retail players. They likely saw massive surges of traffic on their websites this past Cyber Monday, which generated a reported $10.7 billion in sales. These companies probably use big data platforms to provide real-time recommendations for shoppers as well as to analyze and catalog their actions. In a traditional big data infrastructure, a company would need to deploy physical servers to support this activity. These servers would likely not be needed the other 364 days of the year, resulting in wasted expenditures. However, in the cloud, retail companies could simply spin up the big data and the resources that are needed and then get rid of them when traffic subsides.

This sort of elasticity occurs on a day-to-day basis for many companies that are driving the adoption of big data. Most websites experience a few hours of peak traffic and few hours of light traffic each day. Think of social media, video streaming, or dating sites. Elasticity is a major feature of big data, and the cloud provides the elasticity to keep those sites performing under any conditions.

One important thing to keep in mind when deploying big data in the cloud is cost assurance. In situations like the ones described above, organizations suddenly use a lot more compute and other resources. It’s important to have set controls when operating in the cloud to prevent unforeseen, massive cost overruns. In short, a business’s autoscaling rules must operate within its larger business context so it’s not running over budget during traffic spikes. And it’s not just the sudden spikes you need to worry about. A strict cost assurance strategy needs to be in place even as you gradually migrate apps and grow your cloud deployments. Costs can rise quickly based on tiered pricing, and there’s not always a lot of visibility depending on the cloud platform.

Reasons Why Big Data Migrations Fail—and Ways to Succeed

Watch video presentation

A Hybrid Future

Of course, the cloud isn’t ideal for all big data deployments. Some amount of sensitive data, such as financial or government records, will always be stored on-premises. Also, in specific environments such as high-performance computing (HPC), data will often be kept on-premises to meet rigorous speed and latency requirements. But for most big data deployments, the cloud is the best way to go.

As a result, we can expect to see organizations adopt a hybrid cloud approach in which they deploy more and more big data in the cloud but keep certain applications in their own data centers. Hybrid is the way of the future, and the market seems to be bearing that out. A hybrid approach allows enterprises to keep their most sensitive and heaviest data on-premises while moving other workloads to the public cloud.

It’s important to note that this hybrid future will also be multi-cloud, with organizations putting big data in a combination of AWS, Azure, and Google Cloud. These organizations will have the flexibility to operate seamlessly between public clouds and on-premises. The different cloud platforms have different strengths and weaknesses, so it makes sense for organizations embracing the cloud to use a combination of platforms to best accommodate their diverse needs. In doing so, they can also help optimize costs by migrating apps to the cloud that is cheapest for that type of workload. A multi-cloud approach is also good for protecting data, enabling customers to keep apps backed up in another platform. Multi-cloud also helps avoid one of the bigger concerns about the cloud: vendor lock-in.

Cloud adoption is a complex, dynamic life cycle—there aren’t firm start and finish dates like with other projects. Moving to the cloud involves phases such as planning, migration, and operations that, in a way, are always ongoing. Once you’ve gotten apps to the cloud, you’re always trying to optimize them. Nothing is stationary, as your organization will continue to migrate more apps, alter workload profiles, and roll out new services. In order to accommodate the fluidity of the cloud, you need the operational capacity to monitor, adapt, and automate the entire process.

The promise of big data was always about the revolutionary insights it offers. As the blueprint for how best to deploy, scale, and optimize big data becomes clearer, enterprises can focus more on leveraging insights from that data to drive new business value. Embracing the cloud may seem complex, but the cloud’s scale and agility allow organizations to mine those critical insights at greater ease and lower cost.

Next Steps

Be sure to check out Unravel Director of Solution Engineering Chris Santiago’s on-demand webinar recording—no form to fill out—on Reasons Why Big Data Cloud Migrations Fail and Ways to Succeed.

The post Big Data Meets the Cloud appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/big-data-meets-the-cloud/feed/ 0
A Primer on Hybrid Cloud and Edge Infrastructure https://www.unraveldata.com/resources/a-primer-on-hybrid-cloud-and-edge-infrastructure/ https://www.unraveldata.com/resources/a-primer-on-hybrid-cloud-and-edge-infrastructure/#respond Tue, 02 Nov 2021 02:06:14 +0000 https://www.unraveldata.com/?p=7794 Cloud Pastel Background

Thank you for your interest in the 451 Research Report, Living on the edge: A primer on hybrid cloud and edge infrastructure. You can download it here. 451 Research: Living on the edge: A primer on […]

The post A Primer on Hybrid Cloud and Edge Infrastructure appeared first on Unravel.

]]>
Cloud Pastel Background

Thank you for your interest in the 451 Research Report, Living on the edge: A primer on hybrid cloud and edge infrastructure.

You can download it here.

451 Research: Living on the edge: A primer on hybrid cloud and edge infrastructure
Published Date: October 11, 2021

Introduction
Without the internet, the cloud is nothing. But few of us really understand what is inside the internet. What is the so-called ‘edge’ of the internet, and why does it matter? And how does cloud play into the edge story? This primer seeks to explain these issues to a non-tech audience.

Get the 451 Take. Download Report.

The post A Primer on Hybrid Cloud and Edge Infrastructure appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/a-primer-on-hybrid-cloud-and-edge-infrastructure/feed/ 0
Twelve Best Cloud & DataOps Articles https://www.unraveldata.com/resources/twelve-best-cloud-dataops-articles/ https://www.unraveldata.com/resources/twelve-best-cloud-dataops-articles/#respond Thu, 28 Oct 2021 19:27:17 +0000 https://www.unraveldata.com/?p=7774 Computer Network Background Abstract

Our resource picks for October! Prescriptive Insights On Cloud & DataOps Topics Interested in learning about different technologies and methodologies, such as Databricks, Amazon EMR, cloud computing and DataOps? A good place to start is reading […]

The post Twelve Best Cloud & DataOps Articles appeared first on Unravel.

]]>
Computer Network Background Abstract

Our resource picks for October!
Prescriptive Insights On Cloud & DataOps Topics

Interested in learning about different technologies and methodologies, such as Databricks, Amazon EMR, cloud computing and DataOps? A good place to start is reading articles that give tips, tricks, and best practices for working with these technologies.

Here are some of our favorite articles from experts on cloud migration, cloud management, Spark, Databricks, Amazon EMR, and DataOps!

Cloud Migration

Cloud-migration Opportunity: Business Value Grows but Missteps Abound
(Source: McKinsey & Company)
Companies aim to embrace the cloud more fully, but many are already failing to reap the sizable rewards. Outperformers have shown what it takes to overcome the costly hurdles and could potentially unlock $1 trillion in value, according to a recent McKinsey article.

4 Major Mistakes That Can Derail a Cloud Migration (Source: MDM)
If your organization is thinking of moving to the cloud, it’s important to know both what to do and what NOT to do. This article details four common missteps that can hinder your journey to the cloud. One such mistake is not having a cloud migration strategy.

Check out the full article on the Modern Distribution Management (MDM) site to learn about other common mistakes, their impacts, and ways to avoid them.

Plan Your Move: Three Tips For Efficient Cloud Migrations (Source: Forbes)
Think about the last time you moved to a new place. Moving is usually exciting, but the logistics can get complicated. The same can be said for moving to the cloud.

Just as a well-planned move is often the smoothest, the same holds true for cloud migrations.

As you’re packing up your data and workloads to transition business services to the cloud, check out this article on Forbes for three best practices for cloud migration planning.

(Bonus resource: Check out our Ten Steps to Cloud Migration post. If your company is considering making the move, these steps will help!)

Cloud Management

How to Improve Cloud Management (Source: DevOps)
The emergence of technologies like AI and IoT as well as the spike in remote work due to the COVID-19 pandemic have accelerated cloud adoption.

With this growth comes a need for a cloud management strategy in order to avoid unnecessarily high costs and security or compliance violations. This DevOps article shares insights on how to build a successful cloud management strategy.

The Challenges of Cloud Data Management (Source: TechTarget)
Cloud spend and the amount of data in the cloud continues to grow at an unprecedented rate. This rapid expansion is causing organizations to also face new cloud management challenges as they try to keep up with cloud data management advancements.

Head over to TechTarget to learn about cloud management challenges, including data governance and adhering to regulatory compliance frameworks.

Spark

Spark: A Data Engineer’s Best Friend (Source: CIO)
Spark is the ultimate tool for data engineers. It simplifies the work environment by providing a platform to organize and execute complex data pipelines and powerful tools for storing, retrieving, and transforming data.

This CIO article describes different things data engineers can do with Spark, touches on what makes Spark unique, and explains why it is so beneficial for data engineers.

Is There Life After Hadoop? The Answer is a Resounding Yes. (Source: CIO)
Many organizations who invested heavily in the Hadoop ecosystem have found themselves wondering what life after Hadoop is like and what lies ahead.

This article addresses life after Hadoop and lays out a strategy for organizations entering the post-Hadoop era, including reasons why you may want to embrace Spark as an alternative. Head to the CIO site for more!

Databricks

5 Ways to Boost Query Performance with Databricks and Spark (Source: Key2 Consulting)
When running Spark jobs on Databricks, do you often find yourself frustrated by slow query times?

Check out this article from Key2 Consulting to discover 5 rules for speeding up query times. The rules include:

  • Cache intermediate big dataframes for repetitive use.
  • Monitor the Spark UI within a cluster where a Spark job is running.

For more information on these rules and to find out the remaining three, check out the full article.

What is a Data Lakehouse and Why Should You Care? (Source: S&P Global)
A data lakehouse is an environment designed to combine the data structure and data management features of a data warehouse with the low-cost storage of a data lake.

Databricks offers a couple data lakehouses, including Delta Lake and Delta Engine. This article from S&P Global, gives a more comprehensive explanation of what a data lakehouse is, its benefits, and what lakehouses you can use on Databricks.

Amazon EMR

What is Amazon EMR? – Amazon Elastic MapReduce Tutorial (Source: ADMET)
AWS EMR is among the hottest clouds and massive data-based platforms. It gives a supervised structure for simply, cost-effectively, and securely working information processing frameworks.

In this ADMET blog, learn what Amazon Elastic MapReduce is and how it can be used to deal with a variety of issues.

DataOps

3 Steps for Successfully Implementing Industrial DataOps (Source: eWeek)
DataOps has been growing in popularity over the past few years. Today, we see many industrial operations realizing the value of DataOps.

This article explains three steps for successfully implementing industrial DataOps:

1. Make industrial data available
2. Make data useful
3. Make data valuable

Head over to eWeek for a deeper dive into the benefits of implementing industrial DataOps and what these three steps really mean.

Using DataOps To Maximize Value For Your Business (Source: Forbes)
Everybody is talking about artificial intelligence and data, but how do you make it real for your business? That’s where DataOps comes in.

From this Forbes article, learn how DataOps can be used to solve common business challenges, including:

  • A process mismatch between traditional data management and newer techniques such as AI.
  • A lack of collaboration to drive a successful cultural shift and support operational readiness.
  • Unclear approach to measure success across the organization.

In Conclusion

Knowledge is power! We hope our data community enjoys these resources and they provide valuable insights to help you in your current role and beyond.

Be sure to visit our library of resources on DataOps, Cloud Migration, Cloud Management (and more) for best practices, happenings, and expert tips and techniques. If you want to know more about Unravel Data, you can sign up for a you can sign up for a free account or contact us.

The post Twelve Best Cloud & DataOps Articles appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/twelve-best-cloud-dataops-articles/feed/ 0
Migrating Data Pipelines from Enterprise Schedulers to Airflow https://www.unraveldata.com/resources/migrating-data-pipelines-from-enterprise-schedulers-to-airflow/ https://www.unraveldata.com/resources/migrating-data-pipelines-from-enterprise-schedulers-to-airflow/#respond Thu, 02 Sep 2021 17:16:42 +0000 https://www.unraveldata.com/?p=7197 Abstract Blue Light Background

What is a Data Pipeline? Data pipelines convert rich, varied, and high-volume data sources into insights that power the innovative data products that many of us run today. Shivnath represents a typical data pipeline using the […]

The post Migrating Data Pipelines from Enterprise Schedulers to Airflow appeared first on Unravel.

]]>
Abstract Blue Light Background

What is a Data Pipeline?

Data pipelines convert rich, varied, and high-volume data sources into insights that power the innovative data products that many of us run today. Shivnath represents a typical data pipeline using the diagram below.

Data Pipeline Chart

In a data pipeline, data is continuously captured and then stored into distributed a storage system, such as a data lake or data warehouse. From there, a lot of computation happens on the data to transform it into the key insights that you want to extract. These insights are then published and made available for consumption.

Modernizing Data Stacks and Pipelines

Many enterprises have already built data pipelines on stacks such as Hadoop, or using solutions such as NewHive and HDFS. Many of these pipelines are orchestrated with enterprise schedulers, such as Autosys, Tidal, Informatica Pentaho, or native schedulers. For example, Hadoop comes with a native scheduler called Oozie.

In these environments, there are common challenges people face when it comes to their data pipelines. These problems include:

  • Large clusters supporting multiple apps and tenants: Clusters tend to be heavily multi-tenant and some apps may struggle for resources.
  • Less agility: In these traditional environments, there tends to be less agility in terms of adding more capabilities and releasing apps quickly.
  • Harder to scale: In these environments, data pipelines tend to be in large data centers where you may not be able to add resources easily.

These challenges are causing many enterprises to modernize their stacks. In the process, they are picking innovative schedulers, such as Airflow, and they’re changing their stacks to incorporate systems like Databricks, Snowflake, or Amazon EMR. With modernization, companies are often striving for:

  • Smaller, decentralized, app-focused clusters: Instead of running large clusters, companies are trying to run smaller, more focused environments.
  • More agility and easier scalability: When clusters are smaller, they also tend to be more agile and easier to scale. This is because you can decouple storage from compute, then allocate resources when you need them.

Data Stacks Pipeline Examples

Shivnath shares even more goals of modernization, including removing resources as a constraint when it comes to how fast you can release apps and drive ROI, as well as reducing cost.

So why does Airflow often get picked as part of modernization? The goals that motivated the creation of Airflow often tie in very nicely with the goals of modernization efforts. Airflow enables agile development and is better for cloud-native architectures compared to traditional schedulers, especially in terms of how fast you can customize or extend it. Keeping with the modern methodology of agility, Airflow is also available as a service from companies like Amazon and Astronomer.

Diving deeper into the process of modernization, there are two main phases at the high level, Phase 1: Assess and Plan and Phase 2: Migrate, Validate, and Optimize. The rest of the presentation dived deep into the key lessons that Shivnath and Hari have learned from helping a large number of enterprises migrate from their traditional enterprise schedulers and stacks to Airflow and modern data stacks.

Lessons Learned

Phase 1: Assess and Plan

The assessment and planning phase of modernization is made up of a series of other phases, including:

  • Pipeline discovery: First you have to discover all the pipelines that need to be migrated.
  • Resource usage analysis: You have to understand the resource usage of the pipelines.
  • Dependency analysis: More importantly, you have to understand all the dependencies that the pipelines may have.
  • Complexity analysis: You need to understand the complexity of modernizing the pipelines. For example, some pipelines that run on-prem can actually have thousands of stages and run for many hours.
  • Mapping to the target environment.
  • Cost estimation for target environment.
  • Migration effort estimation.

Shivnath said that he has learned two main lessons from the assessment and planning phase:

Lesson 1: Don’t underestimate the complexity of pipeline discovery

Multiple schedulers may be used, such as Autosys, Informatica, Oozie, Pentaho, Tidal, etc. And worse, there may not be any common pattern in how these pipelines work, access data, schedule and name apps, or allocate resources.

Lesson 2: You need very fine grain tracking from a telemetry data perspective

Due to the complexity of data pipeline discovery, tracking is needed in order to do a good job at resource usage estimation, dependency analysis, and to map the complexity and cost of running pipelines in a newer environment.

After describing the two lessons, Shivnath goes through an example to further illustrate what he has learned.

Shivnath then passes it on to Hari, who speaks about the lessons learned during the migration, validation, and optimization phase of modernization.

Phase 2: Migrate, Validate, and Optimize

While Shivnath shared various methodologies that have to do with finding artifacts and discovering the dependencies between them, there is also a need to instill a sense of confidence in the entire migration process. This confidence can be achieved by validating the operational side of the migration journey.

Data pipelines, regardless of where they live, are prone to suffer from the same issues, such as:

  • Failures and inconsistent results
  • Missed SLAs and growing lag/backlog
  • Cost overruns (especially prevalent in the cloud)

To maintain the overall quality of your data pipelines, Hari recommends constantly evaluating pipelines using three major factors: correctness, performance, and cost. Here’s a deeper look into each of these factors:

  • Correctness: This refers to data quality. Artifacts such as tables, views, or CSV files are generated at almost every stage of the pipeline. We need to lay down the right data checks at these stages, so that we can make sure that things are consistent across the board. For example, a check could be that the partitions of a table should have at least n number of records. Another check could be that a specific column of a table should never have null values.
  • Performance: Evaluating performance has to do with setting SLAs and maintaining baselines for your pipeline to ensure that performance needs are met after the migration. Most orchestrators have SLA monitoring baked in. For example, in Airflow the notion of an SLA is incorporated in the operator itself. Additionally, if your resource allocations have been properly estimated during the migration assessment and planning phase, often you’ll see that SLAs are similar and maintained. But in a case where something unexpected arises, tools like Unravel can help maintain baselines, and help troubleshoot and tune pipelines, by identifying bottlenecks and suggesting performance improvements.
  • Cost: When planning migration, one of the most important parts is estimating the cost that the pipelines will incur and, in many cases, budgeting for it. Unravel can actually help monitor the cost in the cloud. And by collecting telemetry data and interfacing with cloud vendors, Unravel can offer vendor-specific insights that can help minimize the running cost of these pipelines in the cloud.

Hari then demos several use cases where he can apply the lessons learned. To set the stage, a lot of Unravel’s enterprise customers are migrating from the more traditional on-prem pipelines, such as Oozie and Tidal, to Airflow. The examples in this demo are actually motivated by real scenarios that customers have faced in their migration journey.

The post Migrating Data Pipelines from Enterprise Schedulers to Airflow appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/migrating-data-pipelines-from-enterprise-schedulers-to-airflow/feed/ 0
Reasons Why Big Data Cloud Migrations Fail https://www.unraveldata.com/resources/reasons-why-big-data-cloud-migrations-fail-and-ways-to-succeed/ https://www.unraveldata.com/resources/reasons-why-big-data-cloud-migrations-fail-and-ways-to-succeed/#respond Fri, 09 Apr 2021 22:01:56 +0000 https://www.unraveldata.com/?p=8127

The post Reasons Why Big Data Cloud Migrations Fail appeared first on Unravel.

]]>

The post Reasons Why Big Data Cloud Migrations Fail appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/reasons-why-big-data-cloud-migrations-fail-and-ways-to-succeed/feed/ 0
Moving Big Data and Streaming Data Workloads to Google Cloud Platform https://www.unraveldata.com/resources/moving-big-data-and-streaming-data-workloads-to-google-cloud-platform/ https://www.unraveldata.com/resources/moving-big-data-and-streaming-data-workloads-to-google-cloud-platform/#respond Fri, 05 Feb 2021 22:24:18 +0000 https://www.unraveldata.com/?p=8152

The post Moving Big Data and Streaming Data Workloads to Google Cloud Platform appeared first on Unravel.

]]>

The post Moving Big Data and Streaming Data Workloads to Google Cloud Platform appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/moving-big-data-and-streaming-data-workloads-to-google-cloud-platform/feed/ 0
Minding the Gaps in Your Cloud Migration Strategy https://www.unraveldata.com/resources/minding-the-gaps-in-your-cloud-migration-strategy/ https://www.unraveldata.com/resources/minding-the-gaps-in-your-cloud-migration-strategy/#respond Thu, 28 Jan 2021 13:00:52 +0000 https://www.unraveldata.com/?p=5833

As your organization begins planning and budgeting for 2021 initiatives, it’s time to take a critical look at your cloud migration strategy. If you’re planning to move your on-premises big data workloads to the cloud this […]

The post Minding the Gaps in Your Cloud Migration Strategy appeared first on Unravel.

]]>

As your organization begins planning and budgeting for 2021 initiatives, it’s time to take a critical look at your cloud migration strategy. If you’re planning to move your on-premises big data workloads to the cloud this year, you’re undoubtedly faced with a number of questions and challenges:

  • Which workloads are best suited for the cloud?
  • How much will each workload cost to run?
  • How do you manage workloads for optimal performance, while keeping costs down?

Gartner Cloud Migration Report

Neglecting careful workload planning and controls prior to cloud migration can lead to unforeseen cost spikes. That’s why we encourage you to read Gartner’s new report that cites serious gaps in how companies move to the cloud: “Mind the Gaps in DBMS Cloud Migration to Avoid Cost and Performance Issues.”

Gartner’s timely report provides invaluable information for any enterprise with substantial database spending, whether on-premises, in the cloud, or migrating to the cloud. Organizations typically move to the cloud to save money, cutting costs by an average of 21% according to the report. However, Gartner finds that migrations are often more expensive and disruptive than initially planned because organizations neglect three crucial steps:

  • Price/performance comparison. They fail to assess the price and performance of their apps, both on-premises and after moving to the cloud.
  • Apps conversion assessment. They don’t assess the cost of converting apps to run effectively in the cloud, then get surprised by failed jobs and high costs.
  • Ops conversion assessment. DataOps tasks change greatly across environments, and organizations don’t maximize their gains from the move.

When organizations do not to take these important steps, they typically fail to complete the migration on-time, overspend against their established cloud operational budgets, and miss critical optimization opportunities available in the cloud.

Remove the Risk of Cloud Migration With Unravel Data

Unravel Data can help you fill in the gaps cited in the Gartner report, providing full-stack observability and AI-powered recommendations to drive more reliable performance on Azure, AWS, Google Cloud Platform or your in own data center. By simplifying, operationalizing, and automating performance improvements, applications are more reliable, and costs are lower. Your team and your workflows will be more efficient and productive, so you can focus your resources on your larger vision.

To learn more – including information about our Cloud Migration Acceleration Programs – contact us today. And make sure to download your copy of the Gartner report. Or start by reading our two-page executive summary.

The post Minding the Gaps in Your Cloud Migration Strategy appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/minding-the-gaps-in-your-cloud-migration-strategy/feed/ 0
Cost-Effective, High-Performance Move to Cloud https://www.unraveldata.com/resources/cost-effective-high-performance-move-to-cloud/ https://www.unraveldata.com/resources/cost-effective-high-performance-move-to-cloud/#respond Thu, 05 Nov 2020 21:38:49 +0000 https://www.unraveldata.com/?p=5484

The post Cost-Effective, High-Performance Move to Cloud appeared first on Unravel.

]]>

The post Cost-Effective, High-Performance Move to Cloud appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/cost-effective-high-performance-move-to-cloud/feed/ 0
The Ten Steps To A Successful Cloud Migration Strategy https://www.unraveldata.com/ten-steps-to-cloud-migration/ https://www.unraveldata.com/ten-steps-to-cloud-migration/#respond Tue, 03 Nov 2020 06:53:36 +0000 https://www.unraveldata.com/?p=5296 Map with dragons

In cloud migration, also known as “move to cloud,” you move existing data processing tasks to a cloud platform, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform, to private clouds, and-or to […]

The post The Ten Steps To A Successful Cloud Migration Strategy appeared first on Unravel.

]]>
Map with dragons

In cloud migration, also known as “move to cloud,” you move existing data processing tasks to a cloud platform, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform, to private clouds, and-or to hybrid cloud solutions.

See our blog post, What is Cloud Migration, for an introduction.

Cloud migration steps from IBM

Figure 1: Steps in cloud migration. (Courtesy IBM, via Slideshare.)

Cloud migration usually includes most or all of the following steps:

  1. Identifying on-premises workloads to move to cloud
  2. Baselining performance and resource use before moving
  3. (Potentially) optimizing the workloads before the move
  4. Matching services/components to equivalents on different cloud platforms
  5. Estimating the cost and performance of the software, post-move
  6. Estimating the cost and schedule for the move
  7. Tracking and ensuring success of the migration
  8. Optimizing the workloads in the cloud
  9. Managing the workloads in the cloud
  10. Rinse and repeat

Some of the steps are interdependent; for instance, it helps to know the cost of moving various workloads when choosing which ones to move first. Both experiences with cloud migration and supportive tools can help in obtaining the best results.

The bigger the scale of the workloads you are considering moving to the cloud, the more likely you will be to want outside consulting help, useful tools, or both. With that in mind, it might be worthwhile to move a smaller workload or two first, so you can gain experience and decide how best to proceed for the larger share of the work.

We briefly describe each of the steps below, to help create a shared understanding for all the participants in a cloud migration project, and for executive decision-makers. We’ve illustrated the steps with screenshots from Unravel Data, which includes support for cloud migration as a major feature. The steps, however, are independent of any specific tool.

Note. You may decide, at any step, not to move a given workload to the cloud – or not to undertake cloud migration at all. Keep track of your results, as you complete each step, and evaluate whether the move is still a good idea.

The Relative Advantages of On-Premises and Cloud

Before starting a cloud migration project, it’s valuable to have a shared understanding of the relative advantages of on-premises installations vs. cloud platforms.

On-premises workloads tend to be centrally managed by a large and dedicated staff, with many years of expertise in all the various technologies and business challenges involved. Allocations of servers, software licenses, and operations people are budgeted in advance. Many organizations keep some or all of their workloads on-premises due to security concerns about the cloud, but these concerns may not always be well-founded.

On-premises hosting has high capital costs; long waits for new hardware, software updates, and configuration changes; and difficulties finding skilled people, including for supporting older technologies.

Cloud workloads, by contrast, are often managed in a more ad hoc fashion by small project teams. Servers and software licenses can be acquired instantly, though it still takes time to find skilled people to run them. (This is a major factor in the rise of DevOps & DataOps, as developers and operations people learn each other’s skills to help get things done.)

The biggest advantage of working in the cloud is flexibility, including reduced need for in-house staff. The biggest disadvantage is the flip side of that same flexibility: surprises as to costs. Costs can go up sharply and suddenly. It’s all too easy for a single complex query to cost many thousands of dollars to run, or for hundreds of thousands of dollars in costs to appear unexpectedly. Also, the skilled people needed to develop, deploy, maintain, and optimize cloud solutions are in short supply.

When it comes to building up an organization’s ability to use the cloud, fortune favors the bold. Organizations that develop a reputation as strong cloud shops are more able to attract the talent needed to get benefits from the cloud. Organizations which fall behind have trouble catching up.

Even some of these bold organizations, however, often keep a substantial on-premises footprint. Security, contract requirements with customers, lower costs for some workloads (especially stable, predictable ones), and high costs for migrating specific workloads to the cloud are among the reasons for keeping some workloads on-premises.

The Ten Steps To A Successful Cloud Migration

1. Identifying On-Premises Workloads to Move to Cloud

Typically, a large organization will run a wide variety of workloads, some on-premises, and some in the cloud. These workloads are likely to include:

  • Third-party software and open source software hosted on company-owned servers; examples include Hadoop, Kafka, and Spark installations
  • Software created in-house hosted on company-owned servers
  • SaaS packages hosted on the SaaS provider’s servers
  • Open source software, third-party software, and in-house software running on public cloud platform, private cloud, or hybrid cloud servers

The cloud migration motion is to examine software running on company-owned servers and either replace it with a SaaS offering, or move it to a public, private, or hybrid cloud platform.

2. Baselining Performance and Resource Use

It’s vital to analyze each on-premises workload that is being considered for move to cloud for performance and resource use. To move the workload successfully, similar resources in the cloud need to be identified and costed. And the new, cloud-based version must meet or exceed the performance of the on-premises original, while running at lower cost. (Or while gaining flexibility and other benefits that are deemed worth any cost increase.)

Baselining performance and resource use for each workload may need to include:

  • Identifying the CPU and memory usage of the workload
  • Identifying all elements in the software stack that supports the workload, on-premises
  • Identifying dependencies, such as data tables used by a workload
  • Identifying the most closely matching software elements available on each cloud platform that’s under consideration
  • Identifying custom code that will need to be ported to the cloud
  • Specifying work needed to adapt code to different supporting software available in the cloud (often older versions of the software that’s in use on-premises)
  • Specifying work needed to adapt custom code to similar code platforms (for instance, modifying Hive SQL to a SQL version available in the cloud)

This work is easier if a target cloud platform has already been chosen, but you may need to compare estimates for your workloads on several cloud platforms before choosing.

3. (Potentially) Optimizing the Workloads, Pre-Move

GIGO – Garbage In, Garbage Out – is one of the older acronyms in computing. If a workload is a mess on-premises, and you move it, largely unchanged, to the cloud, then it will be a mess in the cloud as well.

If a workload runs poorly on-premises, that may be acceptable, given that the hardware and software licenses needed to run the workload are already paid for; the staff needed for operations are already hired and working. But in the cloud, you pay for resource use directly. Unoptimized workloads can become very expensive indeed.

Unfortunately, common practice is to not optimize workloads – and then implement a rough, rapid lift and shift, to meet externally imposed deadlines. This may be followed by shock and awe over the bills that are then generated by the workload in the cloud. If you take your time, and optimize before making a move, you’ll avoid trouble later.

4. Matching Services to Equivalents in the Cloud

Many organizations will have decided on a specific platform for some or all of their move to cloud efforts, and will not need to go through the selection process again for the cloud platform. In other cases, you’ll have flexibility, and will need to look at more than one cloud platform.

For each target cloud platform, you’ll need to choose the target cloud services you wish to use. The major cloud services providers offer a plethora of choices, usually closely matched to on-premises offerings. You are also likely to find third-party offerings that offer some combination of on-premises and multi-cloud functionality, such as Databricks and Snowflake on AWS and Azure.

The cloud platform and cloud services you choose will also dictate the amount of code revision you will have to do. One crucial detail: software versions in the cloud may be behind, or ahead of, the versions you are using on-premises, and this can necessitate considerable rework.

There are products out there to help automate some of this work, and specialized consultancies that can do it for you. Still, the cost and hassle involved may prevent you from moving some workloads – in the near term, or at all.

Software available on major cloud platforms is shown in Figure 2, from our blog post, Understanding Cloud Data Services (recommended).

Table of software available on cloud platforms.

Figure 2: Software available on major cloud platforms. (Subject to change)

5. Estimating the Cost and Performance of the Software, Post-Move

Now you can estimate the cost and performance of the software, once you’ve moved it to your chosen cloud platform and services. You need to estimate the instance sizes you’ll need and CPU costs, memory costs, and storage costs, as well as networking costs and costs for the services you use. Estimating all of this can be very difficult, especially if you haven’t done much cloud migration work in the past.

Unravel Data is a valuable tool here. It provides detailed estimates, which are likely to be far easier to get, and more accurate, than you could come up with on your own. (Through either estimation, experimentation, or both.) You need to install Unravel Data on your on-premises stack, then use it to generate comparisons across all cloud choices.

An example of a comparison between an on-premises installation and a targeted move to AWS EMR is shown in Figure 3.

Magnifying glass on Unravel screen

Figure 3: On-premises to AWS EMR move comparison

6. Estimating the Cost and Schedule for the Move

After the above analysis, you need to schedule the move, and estimate the cost of the move itself:

  • People time. How many people-hours will the move require? Which people, and what won’t get done while they’re working on the move?
  • Calendar time. How many weeks or months will it take to complete all the necessary tasks, once dependencies are taken into account?
  • Management time. How much management time and attention, including the time to complete the schedule and cost estimates, is being consumed by the move?
  • Direct costs. What are the costs in person-hours? Are there costs for software acquisition, experimentation on the cloud platform, and lost opportunities while the work is being done?

Each of these figures is difficult to estimate, and each depends to some degree on the others. Do this as best you can, to make a solid business decision as to whether to proceed. (See the next step.)

7. Tracking and Ensuring Success of the Migration

We will spend very little digital “ink” here on the crux of the process: actually moving the workloads to the chosen cloud platform. It’s a lot of work, with many steps and sub-steps of its own.

However, all of those steps are highly contingent on the specific workloads you’re moving, and on the decisions you’ve made up to this point. (For instance, you may try a quick and dirty “lift and shift,” or you may take the time to optimize workloads first, on-premises, before making the move.)

So we will simply point out that this step is difficult and time-consuming. It’s also very easy to underestimate the time, cost, and heartburn attendant to this part of the process. As mentioned above, doing a small project first may help reduce problems later.

8. Optimizing the Workloads in the Cloud

This step is easy to ignore, but it may be the most important step of all in making the move a success. You don’t have to optimize workloads on-premises before moving them. But you must optimize the workloads in the cloud after the move. Optimization will improve performance, reduce costs, and help you significantly in achieving better results going forward.

Only after optimization of the workload in the cloud can you calculate an ROI for the project. The ROI you obtain may signal that move to cloud is a vital priority for your organization – or signal that the ancient mapmaker’s warning for mariners, Here Be Dragons, should be considered as you move workloads to the cloud.

Map with dragons

Figure 4: Ancient map with sea monsters. (Source:Atlas Oscura.)

9. Managing the Workloads in the Cloud

You have to actively manage workloads in the cloud, or you’re likely to experience unexpected shocks as to cost. Actually, you’re going to get shocked; but if you manage actively, you’ll keep those shocks to a few days’ worth of bills. If you manage passively, your shocks will equate to one or several months’ worth of bills, which can lead you to question the entire basis of your move to cloud strategy.

Actively managing your workloads in the cloud can also help you identify further optimizations – both for operations in the cloud, and for each of the steps above. And those optimizations are crucial to calculating your ROI for this effort, and to successfully making future cloud migration decisions, one workload at a time.

10. Rinse and Repeat

Once you’ve successfully made your move to cloud decision pay off for one, or a small group of workloads, you’ll want to repeat it. Or, possibly, you’ve learned that move to cloud is not a good idea for many of your existing and planned workloads.

Just as one example, if you currently have a lot of your workloads on a legacy database vendor, you may find many barriers to moving workloads to the cloud. You may need to refactor, or even re-imagine, much of your data processing infrastructure before you can consider cloud migration for such workloads.

That’s the value of going through a process like this for a single, or a small set of workloads first: once you’ve done this, you will know much of what you previously didn’t know. And you will be rested and ready for the next steps in your move to cloud journey.

Unravel Data as a Force Multiplier for Move to Cloud

Unravel Data is designed to serve as a force multiplier in each step of a move to cloud journey. This is especially true for big data moves to cloud.

As the name implies, big data processing on-premises may be the most challenging set of operations when it comes to the potential for generating costs and resource allocation issues that are surprising, even shocking, as you pursue cloud migration.

Unravel Data has been engineered to help you solve these problems. Unravel Data can help you easily estimate the costs and challenges of cloud migration for on-premises workloads heading to AWS, to Microsoft Azure, and to Google Cloud Platform (GCP).

Unravel Data is optimized for big data technologies:

  • On-premises (source): Includes Cloudera, Hadoop, Impala, Hive, Spark, and Kafka
  • Cloud (destination): Similar technologies in the cloud, as well as AWS EMR, AWS Redshift, AWS Glue, AWS Athena, Azure HD Insight, Google Dataproc and Google BigQuery, plus Databricks and Snowflake, which run on multiple public clouds.

We will be adding many on-premises sources, and many cloud destinations, in the months ahead. Both relational databases (which natively support SQL) and NoSQL databases are included in our roadmap.

Unravel is available in each of the cloud marketplaces for AWS, Azure, and GCP. Among the many resources you can find about cloud migration are a video from AWS for using Unravel to move to Amazon EMR, and a blog post from Microsoft for moving big data workloads to Azure HDInsight with Unravel.

We hope you have enjoyed, and learned from, reading this blog post. If you want to know more, you can create a free account or contact us.

The post The Ten Steps To A Successful Cloud Migration Strategy appeared first on Unravel.

]]>
https://www.unraveldata.com/ten-steps-to-cloud-migration/feed/ 0
A Simple Explanation Of What Cloud Migration Actually Is https://www.unraveldata.com/what-is-cloud-migration/ https://www.unraveldata.com/what-is-cloud-migration/#respond Fri, 30 Oct 2020 18:09:16 +0000 https://www.unraveldata.com/?p=5272

Cloud migration also called “move to the cloud,” is the process of moving existing data processing tasks to a cloud platform, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Private clouds and […]

The post A Simple Explanation Of What Cloud Migration Actually Is appeared first on Unravel.

]]>

Cloud migration also called “move to the cloud,” is the process of moving existing data processing tasks to a cloud platform, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Private clouds and hybrid cloud solutions can also serve as destinations.

Organizations move to the cloud because it’s a much easier place to start new experiments or projects; gain operational flexibility; cut costs; and take advantage of services available in the cloud, for instance, AI and machine learning.

Steps in cloud migration include selecting workloads to move, choosing a target cloud platform, choosing what services to use in the cloud, and estimating and assessing costs. See our blog post, Ten Steps to Cloud Migration, for details.

There are many ways to compare on-premises vs. cloud IT spending, but one such estimate, from Flexera, places on-premises workloads at just over two-thirds of spending, with cloud about one-third. Cloud is expected to gain a further 10% of the total in a 12-month period. So cloud migration is being pursued energetically and quickly, overall.

Many cloud migration efforts, however, underachieve or fail. In a recent webinar, Chris Santiago of Unravel Data spelled out three reasons for challenges:

  • Poor planning
  • Failure to optimize for the cloud
  • Failure to achieve anticipated ROI

Unravel Data has support for cloud migration as a major feature, especially for big data workloads, including many versions of Hadoop, Spark, and Kafka, both on-premises and in the cloud. If you want to know more, you can create a free account.

The post A Simple Explanation Of What Cloud Migration Actually Is appeared first on Unravel.

]]>
https://www.unraveldata.com/what-is-cloud-migration/feed/ 0
Reasons Why Your Big Data Cloud Migration Fails and Ways to Overcome https://www.unraveldata.com/resources/reasons-why-your-big-data-cloud-migration-fails-and-ways-to-overcome/ https://www.unraveldata.com/resources/reasons-why-your-big-data-cloud-migration-fails-and-ways-to-overcome/#respond Wed, 21 Oct 2020 21:47:48 +0000 https://www.unraveldata.com/?p=5241

The post Reasons Why Your Big Data Cloud Migration Fails and Ways to Overcome appeared first on Unravel.

]]>

The post Reasons Why Your Big Data Cloud Migration Fails and Ways to Overcome appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/reasons-why-your-big-data-cloud-migration-fails-and-ways-to-overcome/feed/ 0
Unravel AWS Cloud Migration EBook https://www.unraveldata.com/resources/unravel-aws-cloud-migration-ebook/ https://www.unraveldata.com/resources/unravel-aws-cloud-migration-ebook/#respond Tue, 09 Jun 2020 05:08:07 +0000 https://www.unraveldata.com/?p=5438 Welcoming Point72 Ventures and Harmony Partners to the Unravel Family

Thank you for your interest in the Unravel AWS Cloud Migration eBook. You can download it here.

The post Unravel AWS Cloud Migration EBook appeared first on Unravel.

]]>
Welcoming Point72 Ventures and Harmony Partners to the Unravel Family

Thank you for your interest in the Unravel AWS Cloud Migration eBook.

You can download it here.

The post Unravel AWS Cloud Migration EBook appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/unravel-aws-cloud-migration-ebook/feed/ 0
Unravel Introduces Workload Migration and Cost Analytics Solution for Azure Databricks https://www.unraveldata.com/unravel-introduces-workload-migration-and-cost-analytics-solution-for-azure-databricks-now-available-on-azure-marketplace/ https://www.unraveldata.com/unravel-introduces-workload-migration-and-cost-analytics-solution-for-azure-databricks-now-available-on-azure-marketplace/#respond Tue, 25 Feb 2020 12:00:43 +0000 https://www.unraveldata.com/?p=4480 Abstract Background and Azure Databricks Logo

PALO ALTO, Calif. – February 25, 2020– Unravel Data, the only data operations platform providing full-stack visibility and AI-powered recommendations to drive more reliable performance in modern data applications, introduced new migration, cost analytics and architectural […]

The post Unravel Introduces Workload Migration and Cost Analytics Solution for Azure Databricks appeared first on Unravel.

]]>
Abstract Background and Azure Databricks Logo

PALO ALTO, Calif. – February 25, 2020Unravel Data, the only data operations platform providing full-stack visibility and AI-powered recommendations to drive more reliable performance in modern data applications, introduced new migration, cost analytics and architectural mapping capabilities for Unravel for Azure Databricks, which is now generally available from Unravel and in the Azure Marketplace. The move further solidifies Unravel’s mission to support modern data workloads wherever they exist, whether on-premises, in the public cloud or a hybrid setting.

“With more and more big data deployments moving to the public cloud, Unravel has spent the last several years helping to simplify the process of cloud migration as well as improving the management and optimization of modern data workloads once in the cloud. We have recently introduced platforms for all major public cloud platforms,” said Bala Venkatrao, Chief Product Officer, Unravel Data. “This release, highlighted by the industry’s only slice and dice migration capabilities, makes it easier than ever to move data workloads to Azure Databricks, while minimizing costs and increasing performance. The platform also allows enterprises to unify their data pipelines end-to-end, such as Azure Databricks and Azure HDInsight.”

Unravel for Azure Databricks delivers comprehensive monitoring, troubleshooting, and application performance management for Azure Databricks environments. The new additions to the platform include:

  • Slice and dice migration support – Unravel now includes robust migration intelligence to help customers assess their migration planning to Azure Databricks in version 4.5.5.0. Slice and dice migration support provides impact analysis by applications and workloads. It also features recommended cloud cluster topology and cost estimates by service-level agreement (SLA), as well as auto-scaling impact trend analysis as a result of cloud migration.
  • Cost analytics – Unravel will soon add new cost management capabilities to help optimize Azure Databricks workloads as they scale. These new features include cost assurance, cost planning and cost forecasting tools. Together, these tools provide granular detail of individual jobs in Azure Databricks, providing visibility at the workspace, job, and job-run level to track costs or DBUs over time.
  • Detailed architectural recommendations: Unravel for Azure Databricks will soon include right-sizing, a feature that recommends virtual machine or workload types that will achieve the same performance on cheaper clusters.

Unravel for Azure Databricks helps operationalize Spark apps on the platform: Azure Databricks customers can shorten the cycle of getting Spark applications into production by relying on the visibility, operational intelligence, and data driven insights and recommendations that only Unravel can provide. Users enjoy greater productivity by eliminating the time spent on tedious, low value tasks such as log data collection, root cause analysis and application tuning.

In addition to being generally available directly from Unravel, Unravel for Azure Databricks is also available on the Azure Marketplace, where users can try a free trial of the platform and get $2000 in Azure credits. Cloud marketplaces are quickly becoming the preferred way for organizations to procure, deploy and manage enterprise software. Unravel for Azure Databricks on Azure Marketplace offers one-click deployment of Databricks performance monitoring and management in Azure.

About Unravel Data
Unravel Data radically transforms the way businesses understand and optimize the performance and cost of their modern data applications – and the complex data pipelines that power those applications. Providing a unified view across the entire data stack, Unravel’s market-leading data observability platform leverages AI, machine learning, and advanced analytics to provide modern data teams with the actionable recommendations they need to turn data into insights. Some of the world’s most recognized brands like Adobe, 84.51˚ (a Kroger company), and Deutsche Bank rely on Unravel Data to unlock data-driven insights and deliver new innovations to market. To learn more, visit https://www.unraveldata.com.

Media Contact
Blair Moreland
ZAG Communications for Unravel Data
unraveldata@zagcommunications.com

 

 

The post Unravel Introduces Workload Migration and Cost Analytics Solution for Azure Databricks appeared first on Unravel.

]]>
https://www.unraveldata.com/unravel-introduces-workload-migration-and-cost-analytics-solution-for-azure-databricks-now-available-on-azure-marketplace/feed/ 0
Rebuilding Reliable Modern Data Pipelines Using AI and DataOps https://www.unraveldata.com/resources/rebuilding-reliable-modern-data-pipelines-using-ai-and-dataops/ https://www.unraveldata.com/resources/rebuilding-reliable-modern-data-pipelines-using-ai-and-dataops/#respond Mon, 10 Feb 2020 19:47:32 +0000 https://www.unraveldata.com/?p=8036 Cloud Pastel Background

Organizations today are building strategic applications using a wealth of internal and external data. Unfortunately, data-driven applications that combine customer data from multiple business channels can fail for many reasons. Identifying the cause and finding a […]

The post Rebuilding Reliable Modern Data Pipelines Using AI and DataOps appeared first on Unravel.

]]>
Cloud Pastel Background

Organizations today are building strategic applications using a wealth of internal and external data. Unfortunately, data-driven applications that combine customer data from multiple business channels can fail for many reasons. Identifying the cause and finding a fix is both challenging and time-consuming. With this practical ebook, DevOps personnel and enterprise architects will learn the processes and tools required to build and manage modern data pipelines.

Ted Malaska, Director of Enterprise Architecture at Capital One, examines the rise of modern data applications and guides you through a complete data operations framework. You’ll learn the importance of testing and monitoring when planning, building, automating, and managing robust data pipelines in the cloud, on premises, or in a hybrid configuration.

Plan, migrate, and operate modern data stack workloads and data pipelines using cloud-based and hybrid deployment models

  • Learn how performance management software can reduce the risk of running modern data applications
  • Take a deep dive into the components that comprise a typical data processing job
  • Use AI to provide insights, recommendations, and automation when operationalizing modern data systems and data applications
  • Plan, migrate, and operate modern data stack workloads and data pipelines using cloud-based and hybrid deployment models

The post Rebuilding Reliable Modern Data Pipelines Using AI and DataOps appeared first on Unravel.

]]>
https://www.unraveldata.com/resources/rebuilding-reliable-modern-data-pipelines-using-ai-and-dataops/feed/ 0
The Role of Data Operations During Cloud Migrations https://www.unraveldata.com/the-role-of-data-operations-during-cloud-migrations/ https://www.unraveldata.com/the-role-of-data-operations-during-cloud-migrations/#respond Wed, 27 Nov 2019 14:03:28 +0000 https://www.unraveldata.com/?p=3902 digital grid backgroun

Over the last few years, there has been a mad rush within enterprise organizations to move big data workloads to the cloud. On the surface, it would seem that it’s a bit of a keeping up […]

The post The Role of Data Operations During Cloud Migrations appeared first on Unravel.

]]>
digital grid backgroun

Over the last few years, there has been a mad rush within enterprise organizations to move big data workloads to the cloud. On the surface, it would seem that it’s a bit of a keeping up with the Joneses syndrome with organizations moving big data workloads to the cloud simply because they can.

It turns out, however, that the business rationale for moving big data workloads to the cloud is essentially the same as the broader cloud migration story: rather than expending scarce resources on building, managing, and monitoring infrastructure, use those resources, instead, to create value.

In the case of big data workloads, that value comes in the form of uncovering insights in the data or building and operationalizing machine learning models, among others.

To realize this value, however, organizations must move big data workloads to the cloud at scale without adversely impacting performance or incurring unexpected cost overruns. As they do so, they begin to recognize that migrating production big data workloads from on-premises environments to the cloud introduces and exposes a whole new set of challenges that most enterprises are ill-prepared to address.

The most progressive of these organizations, however, are also finding an unexpected solution in the discipline of data operations.

Test-drive Unravel for DataOps

Try Unravel for free

The Challenges With Migrating Big Data to the Cloud

The transition from an on-premises big data architecture to a cloud-based or hybrid approach can expose an enterprise to several operational, compliance and financial risks borne of the complexity of both the existing infrastructure and the transition process.
In many cases, however, these risks are not discovered until the transition is well underway — and often after an organization has already been exposed to some kind of meaningful risk or negative business impact.

As enterprises begin the process of migration, they often discover that their big data workloads are exceedingly complex, difficult to understand, and that their teams lack the skills necessary to manage the transition to public cloud infrastructure.

Once the migrations are underway, the complexity and variability of cloud platforms can make it difficult for teams to effectively manage the movement of these in-production workloads while they navigate the alphabet soup of options (IaaS, PaaS, SaaS, lift and shift, refactoring, cloud-native approaches, and so on) to find the proper balance of agility, elasticity, performance, and cost-effectiveness.

During this transitional period, organizations often suffer from runaway and unexpected costs, and significant operational challenges that threaten the operational viability of these critical workloads.

Worse yet, the initial ease of transition (instantiating instances, etc.) belies the underlying complexity, creating a false sense of security that is gloriously smashed.

Even after those initial challenges are overcome, or at least somewhat mitigated, however, organizations find that having migrated these critical and intensive workloads to the cloud, they now present an on-going management and optimization challenge.

While there is no question that the value driving the move of these workloads to the cloud remains valid and worthwhile, enterprises often realize it only after much gnashing of teeth and long, painful weekends. The irony, however, is that these challenges are often avoidable when organizations first embrace the discipline of data operations and apply it to the migration of their big data workloads to the cloud.

Data Operations: The Bridge Over Troubled Waters

The biggest reason organizations unnecessarily suffer this fate is that they buy into the sex appeal (or the executive mandate – every CIO needs to be a cloud innovator right?) of the cloud sales pitch: just turn it on, move it over, and all is well.

While there is unquestionable value in moving all forms of operations to the cloud, organizations have repeatedly found that it is rarely as simple as merely spinning up an instance and turning on an environment.

Instead of jumping first and sorting out the challenges later, organizations must plan for the complexity of a big data workload migration. One of the simplest ways of doing so is through the application of data operations.

While data operations is a holistic, process-oriented discipline that helps organizations manage data pipelines, organizations can also apply it to significant effect during the migration process. The reason this works is that data operations uses process and performance-oriented data to manage the data pipeline and its associated workloads.

Because of this data orientation, it also provides deep visibility into those workloads — precisely what organizations require to mitigate the less desirable impacts of migrating to the cloud.

Using data operations, and tools built to support it, organizations can gather data that enables them to assess and plan their migrations to the cloud, do baselining, instance mapping, capacity forecasting and, ultimately, cost optimization.

It is this data — or more precisely, the lack of it — that is often the difference between successful big data cloud migrations and the pain and suffering that most organizations now endure.

Cloud migration made much, much easier

Try Unravel for free

The Intellyx Take: Visibility is the Key

When enterprises undertake a big data cloud migration, they must step through three core stages of the effort: planning, the migration process itself, and continual post-migration optimization.

Because most organizations lack sufficient data, they often do only limited planning efforts. Likewise, lacking data and with a pressing urgency to execute, they often rush migration efforts and charge full-steam ahead. The result is that they spend most of their time and resources dealing with the post-migration optimization efforts — where it is both most expensive and most negatively impactful to the organization.

Flipping this process around and minimizing the risk, costs, and negative impact of a migration requires the same key ingredient during each of these three critical stages: visibility.

Organizations need the ability to capture and harness data that enables migration teams to understand the precise nature of their workloads, map workload requirements to cloud-based instances, forecast capacity demands over time, and dynamically manage all of this as workload requirements change and the pipeline transitions and matures.

Most importantly, this visibility also enables organizations to plan and manage phased migrations in which they can migrate only slices of applications at a time, based on specific and targeted demands and requirements. This approach enables not only faster migrations, but also simultaneously reduces both costs and risk.

Of course, this type of data-centric visibility demands tools highly tuned to the specific needs of data operations. Therefore, those organizations that are taking this more progressive and managed approach to big data migrations are turning to purpose-built data operations tools, such as Unravel, to help them manage the process.

The business case for moving big data operations to the cloud is clear. The pathway to doing so without inflicting significant negative impact on the organization, however, is less so.

Leading organizations, therefore, have recognized that they can leverage the data operations discipline to smooth this process and give them the visibility they need to realize value from the cloud, without taking on unnecessary cost, risk or pain.

Copyright © Intellyx LLC. As of the time of writing, Unravel is an Intellyx customer. Intellyx retains final editorial control of this paper.

The post The Role of Data Operations During Cloud Migrations appeared first on Unravel.

]]>
https://www.unraveldata.com/the-role-of-data-operations-during-cloud-migrations/feed/ 0
Unravel Launches Performance Management and Cloud Migration Assessment Solution for Google Cloud Dataproc https://www.unraveldata.com/unravel-launches-performance-management-and-cloud-migration-assessment-solution-for-google-cloud-dataproc/ https://www.unraveldata.com/unravel-launches-performance-management-and-cloud-migration-assessment-solution-for-google-cloud-dataproc/#respond Thu, 17 Oct 2019 22:31:34 +0000 https://www.unraveldata.com/?p=3775 Cloud Google Cloud Platform

We are very excited to announce that Unravel is now available for Google Cloud Dataproc. If you are already on GCP Dataproc or on a journey to migrate on-premises data workloads to GCP Dataproc then Unravel […]

The post Unravel Launches Performance Management and Cloud Migration Assessment Solution for Google Cloud Dataproc appeared first on Unravel.

]]>
Cloud Google Cloud Platform

We are very excited to announce that Unravel is now available for Google Cloud Dataproc. If you are already on GCP Dataproc or on a journey to migrate on-premises data workloads to GCP Dataproc then Unravel is now available immediately to accelerate and de-risk your Dataproc migration and ensure performance SLAs and cost targets are achieved.

We introduce our support for Dataproc as Google Cloud enters what is widely referred to as ‘Act 2’ under the watch of new CEO Thomas Kurian. We can expect an acceleration in new product announcements, engagement initiatives with the partner ecosystem and a restructured enterprise focused go-to-market model. We got a feel for this earlier this year at the San Francisco Google Next event and we can expect a lot more coming out of the highly anticipated Google Cloud Next ’19 event  in London coming in November.

As cloud services adoption continues to accelerate, the complexity will only continue to present Enterprise buyers with a bewildering array of choice. As outlined in our Understanding Cloud Data Services blog, wherever you are on your cloud adoption, Platform evaluation, or workload migration journey, now is the time to start to accelerate your strategic thinking and execution planning for Cloud based data services.

We are already helping customers run their big data workloads on GCP IaaS and with this new addition, we now support cloud native big data workloads running on Dataproc. In addition, Unravel plans to cover other important analytics platforms (including Google BigQuery) as part of our roadmap. This ensures Unravel provides an end-to-end, ‘single pane of glass’ for enterprises to manage their data pipelines on GCP.

Learn more about Unravel for Google Dataproc here and as always please provide feedback so we can continue to deliver for your platform investments.

Press Release quoted below.

————————————————————————————————————————————————

Unravel Data Launches Performance Management and Cloud Migration Assessment Solution for Google Cloud Dataproc

PALO ALTO, Calif. – October 22, 2019 —Unravel Data, the only data operations platform providing full-stack visibility and AI-powered recommendations to drive more reliable performance in modern data applications, today introduced a performance management solution for the Google Cloud Dataproc platform making data workloads running on the top of the platform simpler to use and cheaper to run.

Unravel for Cloud Dataproc, which is available immediately, can improve the productivity of data teams with a simple and intelligent self-service performance management capability, helping dataops teams:

  • Optimize data pipeline performance and ensure application SLAs are adhered to
  • Monitor and automatically fix slow, inefficient and failing Spark, Hive, HBase and Kafka workloads
  • Maximize cost savings by containing resource-hogging users or applications
  • Get a detailed chargeback view to understand which users or departments are utilizing the system resources

For enterprises powered by modern data applications that rely on distributed data systems, the Unravel platform accelerates new cloud workload adoption by operationalizing a reliable data infrastructure, and it ensures enforceable SLAs and lower compute and I/O costs, while drastically lowering storage costs. Furthermore, it reduces operational overhead through rapid mean time to identification (MTTI) and mean time to resolution (MTTR), enabled by unified observability and AIOps capabilities.

“Unravel simplifies the management of data apps wherever they reside – on-premises, in a public cloud, or in a hybrid mix of the two. Extending our platform to Google Cloud Dataproc marks another milestone on our roadmap to radically simplify data operations and accelerate cloud adoption,” said Kunal Agarwal, CEO of Unravel Data. “As enterprises plan and execute their migrations to the cloud, Unravel enables operations and app development teams to improve the performance and reduce the risks commonly associated with these migrations.”

In addition to DataOps optimization, Unravel provides a cloud migration assessment offering to help organizations move data workloads to Google Cloud faster and with lower cost. Unravel has built a goal-driven and adaptive solution that uniquely provides comprehensive details of the source environment and applications running on it, identifies workloads suitable for the cloud and determines the optimal cloud topology based on business strategy, and then computes the anticipated hourly costs. The assessment also provides actionable recommendations to improve application performance and enables cloud capacity planning and chargeback reporting, as well as other critical insights.

“We’re seeing an increased adoption of GCP services for cloud-native workloads as well as on-premises workloads that are targets for cloud migration. Unravel’s full-stack DataOps platform can simplify and speed up the migration of data-centric workloads to GCP giving customers peace of mind by minimizing downtime and lowering risk.” said Mike Leone, Senior Analyst, Enterprise Strategy Group. “Unravel adds operational and business value by delivering actionable recommendations for Dataproc customers. Additionally, the platform can troubleshoot and mitigate migration and operational issues to boost savings and performance for Cloud Dataproc workloads.”

Unravel for Google Cloud Dataproc is available now.

Create a free account. Sign up and get instant access to the Unravel environment. Take a guided tour of Unravel’s full-stack monitoring, AI-driven recommendations, automated tuning and remediation capabilities.

Share this: https://ctt.ac/3Q3qc

About Unravel Data
Unravel radically simplifies the way businesses understand and optimize the performance of their modern data applications – and the complex pipelines that power those applications. Providing a unified view across the entire stack, Unravel’s data operations platform leverages AI, machine learning, and advanced analytics to offer actionable recommendations and automation for tuning, troubleshooting, and improving performance – both today and tomorrow. By operationalizing how you do data, Unravel’s solutions support modern big data leaders, including Kaiser Permanente, Adobe, Deutsche Bank, Wayfair, and Neustar. The company is headquartered in Palo Alto, California, and is backed by Menlo Ventures, GGV Capital, M12, Point72 Ventures, Harmony Partners, Data Elite Ventures, and Jyoti Bansal. To learn more, visit unraveldata.com.

Copyright Statement
The name Unravel Data is a trademark of Unravel Data™. Other trade names used in this document are the properties of their respective owners.

Contacts
Jordan Tewell, 10Fold
unravel@10fold.com
1-415-666-6066

The post Unravel Launches Performance Management and Cloud Migration Assessment Solution for Google Cloud Dataproc appeared first on Unravel.

]]>
https://www.unraveldata.com/unravel-launches-performance-management-and-cloud-migration-assessment-solution-for-google-cloud-dataproc/feed/ 0
Accelerate and Reduce Costs of Migrating Data Workloads to the Cloud https://www.unraveldata.com/accelerate-and-reduce-costs-of-migrating-data-workloads-to-the-cloud/ https://www.unraveldata.com/accelerate-and-reduce-costs-of-migrating-data-workloads-to-the-cloud/#respond Wed, 31 Jul 2019 19:40:29 +0000 https://www.unraveldata.com/?p=3556

Today, Unravel announced a new cloud migration assessment offer to accelerate the migration of data workloads to Microsoft Azure, Amazon AWS, or Google Cloud Platform. Our latest offer fills a significant gap in the cloud journey, […]

The post Accelerate and Reduce Costs of Migrating Data Workloads to the Cloud appeared first on Unravel.

]]>

Today, Unravel announced a new cloud migration assessment offer to accelerate the migration of data workloads to Microsoft Azure, Amazon AWS, or Google Cloud Platform. Our latest offer fills a significant gap in the cloud journey, equips enterprises with the tools to deliver on their cloud strategy, and provides the best possible transition with insights and guidance before, during, and after migration. Full details on the assessment and business value are detailed below in our announcement below.

So, why now?

The rapid increase in data volume and variety has driven organizations to rethink enterprise infrastructures and focus on longer-term data growth, flexibility, and cost savings. Current, on-prem solutions are too complicated, inflexible, and are not delivering on expected value. Data is not living up to its promise.

As an alternative, organizations are looking to cloud services like Azure, AWS, and Google Cloud to provide the flexibility to accommodate modern capacity requirements and elasticity. Unfortunately, organizations are often challenged by unexpected costs and a lack of data and insights to ensure a successful migration process. If left unaddressed, organizations will struggle with the complexity of these projects that don’t fulfill their expectations and frequently result in significant cost overruns.

The cloud migration assessment offer provides details of the source environment and applications running on it, identifies workloads suitable for the cloud, and computes the anticipated hourly costs. It offers granular metrics, as well as broader insights, that eliminate transition complexity and deliver migration success.

Customers can be confident that they’re migrating the right data apps, configuring them properly in the cloud, meeting performance service level agreements, and minimizing costs. Unravel can provide an alternative to what is frequently a manual effort fraught with guesswork and errors.

The two approaches can be characterized per the diagram below

Still unsure how the migration assessment will provide value to your business? Drop us a line to learn more about the offer – or download a sample cloud migration assessment report here.

————-

Read on to learn more about today’s news from Unravel.

Unravel Introduces Cloud Migration Assessment Offer to Reduce Costs and Accelerate the Transition of Data Workloads to Azure, AWS or Google Cloud

New Offer Builds a Granular Dependency Map of On-Premises Data Workloads and Provides Detailed Insights and Recommendations for the Best Transition to Cloud

PALO ALTO, Calif. – July, 31, 2019 Unravel Data, the only data operations platform providing full-stack visibility and AI-powered recommendations to drive more reliable performance in modern data applications, today announced a new cloud migration assessment offer to help organizations move data workloads to Azure, AWS or Google Cloud faster and with lower cost. Unravel has built a goal-driven and adaptive solution that uniquely provides comprehensive details of the source environment and applications running on it, identifies workloads suitable for the cloud and determines the optimal cloud topology based on business strategy , and computes the anticipated hourly costs. The offer also provides actionable recommendations to improve application performance and enables cloud capacity planning and chargeback reporting, as well as other critical insights.

“Managing the modern data stack on-premises is complex and requires expert technical talent to troubleshoot most problems. That’s why more enterprises are moving their data workloads to the cloud, but the migration process isn’t easy , as there’s little visibility into costs and configurations,” said Kunal Agarwal, CEO, Unravel Data. “Unravel’s new cloud migration assessment offer delivers actionable insights and visibility so organizations no longer have to fly blind. No matter where an organization is in its cloud adoption and migration journey, now is the time to accelerate strategic thinking and execution, and this offering ensures the fastest, most cost effective and valuable transition for the full journey-to-cloud lifecycle.”

“Companies have major expectations when they embark on a journey to the cloud. Unfortunately, organizations that migrate manually often don’t fulfill these expectations as the process of transitioning to the cloud becomes more difficult and takes longer than anticipated. And then once there, costs rise higher than forecasted and apps are difficult to optimize,” said Enterprise Strategy Group senior analyst Mike Leone. “This all results from the lack of insight into their existing data apps on-premises and how they should map those apps to the cloud. Unravel’s new offer fills a major gap in the cloud journey, equipping enterprises with the tools to deliver on their cloud goals.”

The journey to cloud is technically complex and aligning business outcomes with a wide array of cloud offerings can be challenging. Unravel’s cloud migration assessment offer takes the guesswork and error-prone manual processes out of the equation to deliver a variety of critical insights. The assessment enables organizations to:

  • Discover current clusters and detailed usage to make an effective and informed move to the cloud
  • Identify and prioritize specific application workloads that will benefit most from cloud-native capabilities, such as elastic scaling and decoupled storage
  • Define the optimal cloud topology that matches specific goals and business strategy, minimizing risks or costs. Users get specific instance types recommendations on the amount of storage needed with the option to choose between local attached and object storage
  • Obtain the hourly costs expected to incur when moving to the cloud, allowing users to compare and contrast the costs for different cloud providers and services and for different goals
  • Compare costs for different cloud options (across IaaS and Managed Hadoop/Spark PaaS services). Includes the ability to override default on-demand prices to incorporate volume discounts users may have received
  • Optimize cloud storage tiering choices for hot, warm, and cold data

The Unravel cloud assessment service encompasses four phases. The first phase is a discovery meeting in which the project is scoped, stakeholders identified and KPIs defined. Then during technical discovery, Unravel works with customers to define use cases, install the product and begin gathering workload data. Following, is the initial readout, as enterprises receive a summary of their infrastructure and workloads along with fresh insights and recommendations for cloud migration. Then comes the completed assessment, including final insights, recommendations and next steps.

Unravel is building a rapidly expanding ecosystem of partners to provide a portfolio of data operations and migration services utilizing the Unravel Data Operations Platform and cloud migration assessment offer.

Enterprises can find a sample cloud migration assessment report here.

About Unravel Data
Unravel radically simplifies the way businesses understand and optimize the performance of their modern data applications – and the complex pipelines that power those applications. Providing a unified view across the entire stack, Unravel’s data operations platform leverages AI, machine learning, and advanced analytics to offer actionable recommendations and automation for tuning, troubleshooting, and improving performance – both today and tomorrow. By operationalizing how you do data, Unravel’s solutions support modern data stack leaders, including Kaiser Permanente, Adobe, Deutsche Bank, Wayfair, and Neustar. The company is headquartered in Palo Alto, California, and is backed by Menlo Ventures, GGV Capital, M12, Point72 Ventures, Harmony Partners, Data Elite Ventures, and Jyoti Bansal. To learn more, visit unraveldata.com.

Copyright Statement
The name Unravel Data is a trademark of Unravel Data™. Other trade names used in this document are the properties of their respective owners.

PR Contact
Jordan Tewell, 10Fold
unravel@10fold.com
1-415-666-6066

 

The post Accelerate and Reduce Costs of Migrating Data Workloads to the Cloud appeared first on Unravel.

]]>
https://www.unraveldata.com/accelerate-and-reduce-costs-of-migrating-data-workloads-to-the-cloud/feed/ 0
Migrating big data workloads to Azure HDInsight https://www.unraveldata.com/migrating-big-data-workloads-to-azure-hdinsight/ https://www.unraveldata.com/migrating-big-data-workloads-to-azure-hdinsight/#respond Fri, 03 May 2019 03:17:06 +0000 https://www.unraveldata.com/?p=2649 Transitioning big data workloads to the cloud

This is a guest blog by Arnab Ganguly, Senior Program Manager for Azure HDInsight at Microsoft. This blog was first published on the Microsoft Azure blog. Migrating big data workloads to the cloud remains a key […]

The post Migrating big data workloads to Azure HDInsight appeared first on Unravel.

]]>
Transitioning big data workloads to the cloud

This is a guest blog by Arnab Ganguly, Senior Program Manager for Azure HDInsight at Microsoft. This blog was first published on the Microsoft Azure blog.

Migrating big data workloads to the cloud remains a key priority for our customers and Azure HDInsight is committed to making that journey simple and cost effective. HDInsight partners with Unravel whose mission is to reduce the complexity of delivering reliable application performance when migrating data from on-premises or a different cloud platform onto HDInsight. Unravel’s Application Performance Management (APM) platform brings a host of services towards providing unified visibility and operational intelligence to plan and optimize the migration process onto HDInsight.

  • Identify current big data landscape and platforms for baselining performance and usage.
  • Use advanced AI and predictive analytics to increase performance, throughput and to reduce application, data, and processing costs.
  • Automatically size cluster nodes and tune configurations for the best throughput for big data workloads.
  • Find, tier, and optimize storage choices in HDInsight for hot, warm, and cold data.

In our previous blog we discussed why the cloud is a great fit for big data and provided a broad view of what the journey to the cloud looks like, phase by phase. In this installment and the following parts we will examine each stage in that life cycle, diving into the planning, migration, operation and optimization phases. This blog post focuses on the planning phase.

Phase one: Planning

In the planning stage you must understand your current environment, determine high priority applications to migrate, and set a performance baseline to be able to measure and compare your on-premises clusters versus your Azure HDInsight clusters. This raises the following questions that need to be answered during the planning phase:

On-premises environment

  • What does my current on-premises cluster look like, and how does it perform?
  • How much disk, compute, and memory am I using today?
  • Who is using it, and what apps are they running?
  • Which of my workloads are best suited for migration to the cloud?
  • Which big data services (Spark, Hadoop, Kafka, etc.) are installed?
  • Which datasets should I migrate?

Azure HDInsight environment

  • What are my HDInsight resource requirements?
  • How do my on-premises resource requirements map to HDInsight?
  • How much and what type of storage would I need on HDInsight, and how will my storage requirements evolve with time?
  • Would I be able to meet my current SLAs or better them once I’ve migrated to HDInsight?
  • Should I use manual scaling or auto-scaling HDInsight clusters, and with what VM sizes?

Baselining on-premises performance and resource usage

To effectively migrate big data pipelines from physical to virtual data centers, one needs to understand the dynamics of on-premises workloads, usage patterns, resource consumption, dependencies and a host of other factors.

Unravel creates on-premises cluster discovery reports in minutes

Unravel provides detailed reports of on-premises clusters including total memory, disk, number of hosts, and number of cores used. This cluster discovery report also delivers insights on cluster topology, running services, operating system version and more. Resource usage heatmaps can be used to determine any unique needs for Azure.

Unravel on-premises cluster discovery reporting

Unravel on-premises cluster discovery reporting

Gain key app usage insights from cluster workload analytics and data insights

Unravel can highlight application workload seasonality by user, department, application type and more to help calibrate and make the best use of Azure resources. This type of reporting can greatly aid in HDInsight cluster design choices (size, scale, storage, scalability options, etc.) to maximize your ROI on Azure expenses.

Unravel also provides data insights to enable decision making on the best strategy for storage in the cloud, by looking at specific metrics on usage patterns of tables and partitions in the on-premises cluster.

Unravel tiered storage detail for tables

It can also identify unused or cold data. Once identified, one can then decide on the appropriate layout for the data in the cloud accordingly and make the best use of their Azure budget. Based on this information, one can distribute datasets most effectively across HDInsight storage options. For example, hottest data can be stored on disk or the highly performant object storage of Azure Data Lake Storage Gen 2 (hot), and the least used ones on the relatively less performant Azure Blob storage (cold).

Data migration

Migrate on-premises data to Azure

There are two main options to migrate data from on-premises to Azure. Learn more information around the processes and data migration best practices.

  1. Transfer data over network with TLS
    • Over the internet. Transfer data to Azure storage over a regular internet connection.
    • Express Route. Create private connections between Microsoft datacenters and infrastructure on-premises or in a colocation facility.
    • Data Box online data transfer. Data Box acts as network storage gateways to manage data between your site and Azure.
  2. Shipping data offline

Once you’ve identified which workloads to migrate, the planning gets a little more involved, requiring a proper APM tool to get the rest right. For everything to work properly in the cloud, you need to map out workload dependencies as they currently exist on-premises. This may be challenging when done manually, as these workloads rely on many different complex components. Incorrectly mapping these dependencies is one of the most common causes of big data application breakdowns in the cloud.

The Unravel platform provides a comprehensive and immediate readout of all the big data stack components involved in a workload. For example it could tell you that a streaming app is using Kafka, HBase, Spark, and Storm, and detail each component’s relationship with one another while also quantifying how much the app relies on each of these technologies. Knowing that the workload relies far more on Spark than Storm allows you to avoid under-provisioning Spark resources in the cloud and overprovisioning Storm.

Resource management and capacity planning

Organizations face a similar challenge in determining resource such as disk, compute, and memory that they will need for the workload to run efficiently on the cloud. It’s a challenge to determine utilization metrics of these resources for on-premises clusters, and which services are consuming them. Unravel provides reports that precisely bring forth quantitative metrics around resources consumed by each big data workload. If resources have been overprovisioned and thereby wasted, as many organizations unknowingly do, the platform provides recommendations to reconfigure applications to maximize efficiency and optimize spend. These resource settings are then translated to Azure.

Since cloud adoption is an ongoing and iterative process, customers might want to look ahead and think about how resource needs will evolve throughout the year as business needs change. Unravel leverages predictive analytics based on previous trends to determine resource requirements in the cloud for up to six months out.

For example, workloads such as fraud detection employ several datasets including ATM transaction data, customer account data, charge location data, and government fraud data. Once in Azure, some apps require certain datasets to remain in Azure in order to work properly, while other datasets can remain on-premises without issue. Like app dependency mapping it’s difficult to determine which datasets an app needs to run properly. Other considerations are security, data governance laws (some sensitive data must remain in private datacenters in certain jurisdictions), as well as the size of data. Based on Unravel’s resource management and capacity planning reports customers can efficiently manage data placement in HDInsight storage options and on-premises to best suit their business requirements.

Capacity planning and chargeback

Unravel brings some additional visibility and predictive capabilities that can remove a lot of mystery and guesswork around Azure migrations. Unravel analyzes your big data workloads for both on-premises or for Azure HDInsight, and can provide chargeback reporting by user, department, application type, queue, or other customer defined tags.

Unravel chargeback reporting

Unravel chargeback reporting by user, application, department, and more

Cluster sizing and instance mapping

As the final part of the planning phase, one will need to decide on the scale, VM sizes, and type of Azure HDInsight clusters to fit the workload type. This would depend on the business use case and priority of the given workload. For example a recommendation engine that needs to meet a stringent SLA at all times might require an auto-scaling HDInsight cluster so that it always has the compute resources it needs, but can also scale down during lean periods to optimize costs. Conversely if you have a workload that is fixed in resource requirements, such as a predictable batch processing app, one might want to deploy manual scaling HDInsight clusters, and size then optimally with the right VM sizes to keep costs under control.

Since choice of HDInsight VM instances is key to the success of the migration Unravel can infer the seasonality of big data workloads, and deliver recommendations for optimal server instance sizes in minutes instead of hours or days.

Unravel instance mapping by workload

Given the  default virtual machine sizes for HDInsight clusters provided by Microsoft, Unravel provides some additional intelligence to help choose the correct virtual machine sizes for data workloads based on three different migration strategies:

  1. Lift and shift – If on-premises clusters collectively had 200 cores, 20 terabytes of storage, and 500 GB of memory Unravel will provide a close mapping to the Azure VM environment.This strategy ensures that the overall Azure HDInsight deployment will have the same (or more) of resources available as the current on-premises environment. This works to minimize any risks associated with under provisioning HDInsight for the migrating workloads.
  2. Cost reduction – This provides a one to one mapping of each existing on-premise host to the most suitable Azure Virtual Machine on HDInsight, such that it matches the actual resource usage. This determines a cost optimized closest fit per host by matching the VMs published specifications to the actual usage of the host. If your on-premise hosts are underutilized this method will always be less expensive than lift and shift.
  3. Workload fit – Consumes application runtime data that Unravel has collected, and offers the flexibility of provisioning Azure resources to provide 100 percent SLA compliance. Can also allow a bit of flexibility to choose a lower value, say 90 percent compliance as pictured below. The flexibility of the workload fit configuration enables the right price-to-performance trade-off in Azure.

Unravel allows for flexibility around SLA compliance in capacity planning for your Azure clusters and can compute average hourly cost at each percentile.

Conclusion

The planning phase is the critical first step towards any workload migration to HDInsight. Many organizations lack effective quantitative and qualitative guidance like the ones provided by Unravel APM during the critical planning process, and may face challenges downstream in areas of workload execution and cost optimization. Unravel’s robust APM platform can help navigate this planning phase complexity by providing tools for mapping workload dependencies, forecasting resource usage, and guiding decisions on which datasets to move, and this in turn can make the migration process much more efficient, data driven, and ultimately successful.

In our upcoming blog, we’ll look closely at migration to HDInsight.

The post Migrating big data workloads to Azure HDInsight appeared first on Unravel.

]]>
https://www.unraveldata.com/migrating-big-data-workloads-to-azure-hdinsight/feed/ 0
Planning and Migration For Modern Data Applications in the Cloud https://www.unraveldata.com/getting-the-most-from-data-apps-cloud/ https://www.unraveldata.com/getting-the-most-from-data-apps-cloud/#respond Tue, 26 Mar 2019 12:57:10 +0000 https://www.unraveldata.com/?p=2505

Current trends indicate that the cloud is a boon for big data. Conversations with our customers also clearly indicate this trend of data workloads and pipelines moving to the cloud. More and more organizations across industries […]

The post Planning and Migration For Modern Data Applications in the Cloud appeared first on Unravel.

]]>

Current trends indicate that the cloud is a boon for big data. Conversations with our customers also clearly indicate this trend of data workloads and pipelines moving to the cloud. More and more organizations across industries are running – or are looking to run – data pipelines, machine learning, and AI in the cloud. And until today, there has not been an easy way to migrate, deploy and manage  data-driven applications in the cloud. But now, getting the most from modern data applications in the cloud requires data driven planning and execution.

Unravel provides full-stack visibility and AI-powered guidance to help customers understand and optimize the performance of their data-driven applications and monitor, manage and optimize their modern data stacks. This applies as much to clusters located in the cloud as it does to modern data clusters on-premise. Specifically, for the cloud, our goal is to cover the entire gamut below:

  • IaaS (Infrastructure as a Service): Cloudera, Hortonworks or MapR data platforms deployed on cloud VMs where your modern data applications are running
  • PaaS (Platform as a Service): Managed Hadoop/Spark Platforms like AWS Elastic MapReduce (EMR), Azure HDInsight, Google Cloud Dataproc etc.
  • Cloud-Native: Products like Amazon Redshift, Azure Databricks, AWS Databricks etc.
  • Serverless: Ready to use, no setup needed services like Amazon Athena, Google BigQuery, Google Cloud DataFlow etc.

We have also learnt that enterprises tend to use a combination of one or more of the above to solve their modern data stack needs. In addition, it is not uncommon to have more than one cloud provider in use (Multi-Cloud Strategy). Often workloads and data are also distributed between on-premise and cloud clusters (Hybrid Strategy).

This blog covers Unravel’s current capabilities in the cloud, what is currently in the works, and what is on our longer term roadmap. Most importantly we talk about how you can participate and be part of this wave with us!

Looking to Migrate your Modern Data Workloads to the Cloud?

Many enterprises today are looking to migrate their modern data workloads  from on-premises to the Cloud. The goals vary from improving ease of management to increasing elasticity to assure SLAs for bursty workloads to reducing costs.

Unravel truly understands your modern data cluster, details of the workloads running on that (and what can be done for improving reliability, efficiency and time to root cause issues). Now, this information and analysis are key for initiatives like migrating your modern data workloads to the cloud as well. So, Unravel has done precisely this – providing you features to help plan and migrate your modern data stack applications to the cloud based on your specific goal(s) (e.g. cost reduction, increased SLA, agility etc.). We have built a Goal-Driven and Adaptive Solution for helping Migrating Modern Data Stack Applications to the Cloud.

Pre-Migration: Planning

Cluster Discovery

Migrating modern data stack workloads to the cloud requires extensive planning, which in turn begins with developing a really good understanding the current (on-premise) environment. Specifically, details around cluster resources and their usage.

Cluster Details

What’s my modern data stack cluster like? What are the services deployed? How many nodes does it have? What are the resources allocated and what is their consumption like across the entire cluster?

Usage Details

What applications are running on it? What is the distribution of applications runs across users/departments? How much resources are these applications utilizing? Which of these applications are bursty in terms of resource utilization? Are some of these not running reliably due to lack of resources?

Let’s see how Unravel can help you discover and piece together such information:

Unravel’s cluster discovery reporting

 

As you can see, Unravel provides a single pane of glass to display all of the relevant information e.g., services deployed, the topology of the cluster, cluster level stats (which have been suitably aggregated over the entire cluster’s resources) in terms of CPU, memory and disk.

The across-cluster heatmaps display the relative utilization of the cluster across time (a 24×7 view for each hour of say a week). If the utilization peaks at specific days and times of the week, you can plan on designing the cloud cluster such that it only scales for those precise times (keeping the costs low for when resources are not needed).

Unravel’s Cluster Discovery Report is purpose built for migrating data workloads to the cloud. All the relevant information is readily made available and all the number crunching is done to provide the most relevant details in a single pane of glass.

Identifying Applications Suitable for the Cloud

After developing an understanding of the cluster, the next step in the cloud migration journey is to figure out which modern data stack workloads may be most suitable to move to the cloud and would result in maximum benefits (e.g. in terms of increased SLA or reduce costs).  So, it would be useful to discover applications of the following kinds for example:

Bursty in nature

Applications that take varying amounts of time to complete and/or resources can be good candidates to move to the cloud. These are typically frequently failing or highly variable applications due to lack of resources, contention and bottlenecks and could be better suited for the cloud so that they can run more reliably and SLAs can be met. Unravel can help you easily identify applications that are bursty in nature.

Discover apps that are bursty in nature using Unravel

 

Unravel enables you to easily identify applications that could run more reliably in the cloud

 

Applications by tenants specific to the Business

Often, many corporations strategically decide to carry out the migration of modern data stack workloads to cloud in a phased manner. They may decide to migrate applications belonging to a specific users and/or queues first followed by others in a different phase (based on say the number of apps/resources used etc.). So, it becomes important to be able to have a clear view into distribution of these applications for various categories.

Unravel provides a clear graphical view into such information. Also, admins can explicitly tag certain application types in Unravel to achieve custom categorization such as grouping applications by department or project.

Unravel: Distribution of apps in different categories

 

Mapping your On-Premise Cluster to a Deployment Topology in the Cloud

Unravel provides you a mapping of your current on-premise environment to a cloud based one. Cloud Mapping Reports tell you the precise details of what type of VMs you would need, how many and what it would cost you.

You can choose different strategies for this mapping based on your goal(s). For each case, the resulting report will provide you the details of the cloud topology to match the goal.

In addition, each of these strategies are aware of multiple cloud providers (AWS EMR, Azure HDInsight, GCP etc.), the specs and cost of each VM, and optimize for mapping your actual workloads to the most effective VMs on the cloud.

Lift and Shift Strategy

This report provides a one to one mapping of each existing on-premise host to the most suitable instance type in Cloud in a way that It meets or exceeds the host’s hardware specs. This strategy ensures that your cloud deployment will have the same (or more) amount of resources available as your current on-premise environment and minimizes any risks associated with migrating to the cloud.

Unravel provides you a mapping of your current on-premise environment to a cloud based one. Strategy: Lift and Shift

 

Cost Reduction Strategy

This report provides a one to one mapping of each existing on-premise host to the most suitable Azure VM (or EC2/EMR/GCP) such that it matches the actual resource usage. This determines a cost optimum closest fit per host by matching the VM’s specs to the actual usage of the host. If your on-premise hosts are underutilized this method will always be less expensive than lift and shift.

Unravel provides you a mapping of your current on-premise environment to a cloud based one. Strategy: Cost Reduction

 

Workload Fit Strategy

Unlike the other methods this is not a one-to-one mapping. Unravel analyzes your workload for the time period and bases it recommendations on that workload. Unravel provides multiple recommendations based on the resources needed to meet X% of your workloads. It determines the optimal assignment of VM types to meet the requirements while minimizing cost. This method is typically the most cost-effective method.

Unravel provides you a mapping of your current on-premise env.  to a cloud based one. Strategy: “Workload Fit” (meeting requirements for 85% of the workloads)

 

The workload fit strategy also enables enterprises to pick the right price-to-performance trade-offs in the cloud. For example, in case less tight SLAs are acceptable, costs can be further reduced by choosing a less expensive VM type.

The above is a sampling of the mapping results showing the recommended cloud cluster topology for a given on-premise one. The mapping is done in accordance with the inputs from you about your specific organizational needs e.g. the specific provider you have decided to migrate to/the need to separate compute and storage or not/use of a specific set of instance types/discounts you get on pricing from the cloud providers and so on… You can even compare the costs for your specific case across different cloud providers and make a decision on the one that best suits your goals.

Unravel also provides a matching of services you have in your on-premise cluster with the ones made available by the specific cloud provider chosen (and whether it is a strong or weak match). The goal of this mapping is to give you a good sense of what applications may need some rewriting and which ones may be easily portable. For example, if you have been using Impala on your on-premise cluster, which is not available on EMR, we will suggest the closest service to use on migrating to EMR.

Also, are several other Unravel features and functionalities to best determine the way you should create the modern data stack cluster in the cloud. For example, check out the cluster workload heatmap:

Heatmap of workload profile: Sunday has several hot hours, Saturday is the hottest day, Monday is hardly use and the cluster could be scaled down drastically on Mondays

 

As you can see above, Unravel analyzes and showcases seasonality to suggest how you can best make use of cloud’s auto-scaling capabilities. As you design your cloud cluster topology, you can use this information to setup appropriate processes for scaling-up and scaling-down clusters to ensure desired SLAs and reducing costs at the same time.

Decide on the best strategy for the storage in the cloud by looking at granular and specific information about the usage patterns of your tables and partitions in your on-premise cluster. Unravel can tell you which tables in your storage are least and most used and those that are used moderately.

Unravel can identify unused/cold data. You can decide on the appropriate layout for your data in the cloud accordingly and make the best use of your money

 

The determination of these labels is based on your specific configuration policy so it can be tailored to the business goals of your organization. Based on this information, you may decide to distribute these tables suitably across various types of storage in the cloud, e.g. most used ones on disk or high performant object storage e.g. AWS S3 Standard Storage/Azure Data Lake Storage Gen 2 (Hot) etc. and least used ones on lesser performant object storage e.g. AWS S3 Glacier Storage/Azure Data Lake Storage Gen 2 (Archive) etc. For example, in the example above, approximately half the data is never used and could be safely squared away to cheaper archival storage when moving to the cloud.

Unravel also provides you analysis on which partitions could possibly be reclaimed and hence some storage space could be saved. You have an opportunity to decide on a trade-off based on this information – store some partitions in archival storage in cloud or get rid of them altogether and reduce costs.

 

During and Post Migration: Tracking the Migration and its success

Unravel can help you track the success of the migration as you move your modern data stack applications from the on-premises cluster to the cloud cluster. You can use Unravel to compare how a given application was performing on-premise and how it is doing in its new home. If the performance is not up to par, Unravel’s insights and recommendations can help bring your migration on track.

Compare how app is doing in new environment. This app is ~17 times slower on cloud. Unravel provides automatic fixes to get app back to meeting SLA.

 

End of Part I

Planning and migrating is, of course, only one step in the journey to realize the full value of big data in the cloud. In Parts II and III of the series, I will cover performance optimization, cloud cost reduction strategies; troubleshooting, debugging, and root cause analysis; and related topics.

In the meantime, check out the Unravel Cloud Data Operations Guide, which covers some of the topics from this blog series.

The post Planning and Migration For Modern Data Applications in the Cloud appeared first on Unravel.

]]>
https://www.unraveldata.com/getting-the-most-from-data-apps-cloud/feed/ 0
Transitioning big data workloads to the cloud: Best practices from Unravel Data https://www.unraveldata.com/transitioning-big-data-workloads-cloud-best-practices/ https://www.unraveldata.com/transitioning-big-data-workloads-cloud-best-practices/#respond Sat, 16 Mar 2019 04:10:04 +0000 https://www.unraveldata.com/?p=2416 TRANSITIONING BIG DATA WORKLOADS TO THE CLOUD: BEST PRACTICES FROM UNRAVEL DATA

Migrating on-premises Apache Hadoop® and Spark workloads to the cloud remains a key priority for many organizations. In my last post, I shared “Tips and tricks for migrating on-premises Hadoop infrastructure to Azure HDInsight.” In this […]

The post Transitioning big data workloads to the cloud: Best practices from Unravel Data appeared first on Unravel.

]]>
TRANSITIONING BIG DATA WORKLOADS TO THE CLOUD: BEST PRACTICES FROM UNRAVEL DATA

Migrating on-premises Apache Hadoop® and Spark workloads to the cloud remains a key priority for many organizations. In my last post, I shared “Tips and tricks for migrating on-premises Hadoop infrastructure to Azure HDInsight.” In this series, one of HDInsight’s partners, Unravel Data, will share their learnings, best practices, and guidance based on their insights from helping migrate many on-premises Hadoop and Spark deployments to the cloud.

Unravel Data is an AI-driven Application Performance Management (APM) solution for managing and optimizing modern data stack workloads. Unravel Data provides a unified, full-stack view of apps, resources, data, and users, enabling users to baseline and manage app performance and reliability, control costs and SLAs proactively, and apply automation to minimize support overhead. Ops and Dev teams use Unravel Data’s unified capability for on-premises workloads and to plan, migrate, and operate workloads on Azure. Unravel Data is available on the HDInsight Application Platform.

Today’s post, which kicks off the five-part series, comes from Shivnath Babu, CTO and Co-Founder at Unravel Data. This blog series will discuss key considerations in planning for migrations. Upcoming posts will outline the best practices for cloud migration, operation, and optimization phases of the cloud adoption lifecycle for big data.

Unravel Data’s perspective on migration planning

The cloud is helping to accelerate big data adoption across the enterprise. But while this provides the potential for much greater scalability, flexibility, optimization, and lower costs for big data, there are certain operational and visibility challenges that exist on-premises that don’t disappear once you’ve migrated workloads away from your data center.

Time and time again, we have experienced situations where migration is oversimplified and considerations such as application dependencies and system version mapping are not given due attention. This results in cost overruns through over-provisioning or production delays through provisioning gaps.

Businesses today are powered by modern data applications that rely on a multitude of platforms. These organizations desperately need a unified way to understand, plan, optimize, and automate the performance of their modern data apps and infrastructure. They need a solution that will allow them to quickly and intelligently resolve performance issues for any system through full-stack observability and AI-driven automation. Only then can these organizations keep up as the business landscape continues to evolve, and be certain that big data investments are delivering on their promises.

Current challenges in big data

Today, IT uses many disparate technologies and siloed approaches to manage the various aspects of their modern data apps and big data infrastructure.

Many existing monitoring solutions often do not provide end-to-end support for modern data stack environments, lack full-stack compatibility, or require complex instrumentation. This includes configuration changes to applications and their components, which requires deep subject matter expertise. The murky soup of monitoring solutions that organizations currently rely on doesn’t deliver the application agility that is required by the business.

Consequently, this results in poor user experience, inefficiencies and mounting costs as organizations buy more and more tools to solve these problems and then have to spend additional resources managing and maintaining those tools.

Additionally, organizations see a high Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR) issues because it is hard to understand the dependencies and keep focused on root cause analysis. The lack of granularity and end to end visibility makes it impossible to remedy all of these problems, and businesses are stuck in a state of limbo.

It’s not an option to continue doing what was done in the past. Teams need a detailed appreciation of what they are doing today, what gaps they still have, and what steps they can take to improve business outcomes. It’s not uncommon to see 10x or more improvements in root cause analysis and remediation times for customers who are able to gain a deep understanding of the current state of their big data strategy and make a plan for where they need to be.

Starting your big data journey to the cloud

Without a unified APM platform, the challenges only intensify as enterprises move big data to the cloud. Cloud adoption is not a finite process with a clear start and end date — it’s an ongoing lifecycle with four broad phases (planning, migration, operation, and optimization). Below, we briefly discuss some of the key challenges and questions that arise for organizations below, which we will dive into in further detail in subsequent posts.

In the planning  phase, key questions may include:

  • “Which apps are best suited for a move to the cloud?”
  • “What are the resource requirements?
  • “How much disk, compute, and memory am I using today?”
  • “What do I need over the next 3, 6, 9, and 12 months?”
  • “Which datasets should I migrate?”
  • “Should I use permanent, transient, autoscaling, or spot instances?”

During migration, which can be a long running process as workloads are iteratively moved, there is a need for continuous monitoring of performance and costs. Key questions may include:

  • “Is the migration successful?”
  • “How does the performance compare to on-premises?”
  • “Have I correctly assessed all the critical dependencies and service mapping?”

Once workloads are in production on the cloud, key considerations include:

  • “How do I continue to optimize for cost and for performance to guarantee SLAs?”
  • “How do I ensure Ops teams are as efficient and as automated as possible?”
  • “How do I empower application owners to leverage self-service to solve their own issues easily to improve agility?”

The challenges of managing disparate modern data stack technologies both on-premise and in the cloud can be solved with a comprehensive approach to operational planning. In this blog series, we will dive deeper into each stage of the cloud adoption lifecycle and provide practical advice for every part of the journey. Upcoming posts will outline the best practices for the planning, migration, operation, and optimization phases of this lifecycle.

About HDInsight application platform

The HDInsight application platform provides a one-click deployment experience for discovering and installing popular applications from the modern data stack ecosystem. The applications cater to a variety of scenarios such as data ingestion, data preparation, data management, cataloging, lineage, data processing, analytical solutions, business intelligence, visualization, security, governance, data replication, and many more. The applications are installed on edge nodes which are created within the same Azure Virtual Network boundary as the other cluster nodes so you can access these applications in a secure manner.

 

The post ‘Transitioning big data workloads to the cloud: Best practices from Unravel Data’ by Ashish Thapliyal Principal Program Manager, Microsoft Azure HDInsight appeared first on Microsoft.

 

 

The post Transitioning big data workloads to the cloud: Best practices from Unravel Data appeared first on Unravel.

]]>
https://www.unraveldata.com/transitioning-big-data-workloads-cloud-best-practices/feed/ 0