Cloud Database Report

Hi everyone! This is an update to my recent blog post on the final days of the legacy data warehouse (link below).

The topic of legacy data warehouses slowly fading away struck a chord with many readers. Now we have updates from Snowflake and Teradata.

On Aug 24, the same day I published “The Final Days of the Legacy Data Warehouse,” Snowflake announced its earnings for Q2 FY2023. Not surprisingly, a question about legacy systems came up during Snowflake’s earnings call. One financial analyst asked Snowflake CEO Frank Slootman about the level of activity of customers migrating from on-premises systems to Snowflake’s data cloud.

Slootman: “In the last week, I've heard two very, very iconic names in two different industries that were staunch on-premises people, who would never ever go cloud, and that are now going [cloud]. So I just feel that the resistance is completely breaking….A lot of this is that they’re going to get left behind. You can’t take advantage of innovations that are only available on the cloud. We’re going to see acceleration out of this.”

Is he right? I have no doubt that he is.

According to Ocient, 59% of respondents to its survey are actively looking to switch data warehouse providers. They specifically named IBM, Cloudera, and Teradata as the top 3 legacy environments that data managers want to move away from.

Their reasons:

· 40% want to modernize their legacy platforms

· 42% feel their existing system isn’t comprehensive enough, and

· 36% say it’s not flexible enough

This explains why Snowflake, with its data cloud and data marketplace, has become such a tour de force. Other disruptors are Databricks, Firebolt, SingleStore, TileDB, Yellowbrick, and of course AWS, Google, and Microsoft.

I would include Ocient as well, with its hyperscale data warehouse platform, which is capable of analyzing trillions of records.

The old guard responds

Where does that leave traditional data warehouse providers—companies like IBM and Teradata? They know that their customers want newer, cloud-native platforms. And they’re taking steps to modernize their offerings.

That brings me back to Teradata, which recently made a product announcement that is relevant to this whole discussion.

Teradata is synonymous with the older data warehouses that many organizations are looking to replace. But Teradata is fighting back, as SVP Ashish Yajnik described to me in an earlier Cloud Database Report podcast conversation (link below).

Teradata’s new cloud-native architecture

Now, Teradata has just introduced VantageCloud Lake, a new and improved cloud data warehouse that is based on a cloud-native architecture. With modern capabilities like object storage in the cloud, auto scaling, and self-service in AWS, and soon to be available in other clouds.

So the decision to move to a cloud data warehouse is getting easier, but also harder in some respects.

Easier because that’s the inevitable direction the industry is heading. For CIOs and CTOs the question is when, not if.
Harder because incumbent vendors like Teradata are not standing by while Snowflake and Databricks pick off their installed base. They’re responding with cloud-native platforms of their own.

Who will be the next leaders in this fast-changing market? We’ll have to wait a while longer for the query results on that question.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit clouddb.substack.com

View Details

Has the database market attempted to solve data complexity — only to create even more complexity?

That’s the argument of Raj Verma, CEO of SingleStore, who thinks he has the answer to the plethora of databases found in many of today’s IT environments: one database that can handle operational data, analytics, and many different data types in a single, unified platform.

It’s not a new idea — the database industry went down the path of “universal databases” back in the 1990’s (i.e. Illustra, Informix) — and SingleStore isn’t the only vendor with an all-purpose DBMS. But the company is establishing its database as a viable solution among the many that are out there. For that reason, I added SingleStore to the Cloud Database Report’s Top 20 list earlier this year (see below).

You may remember SingleStore by its former name, MemSQL. The company was rebranded in 2020, and has been growing, expanding, and building its database for modern applications.

On the latest episode of the Cloud Database Report podcast, I talked to CEO Raj Verma about the rebranding of SingleStore, multi-model databases, the competitive landscape — and Verma’s ambitious goal of being on the short list of preferred database providers for large organizations.

“We feel that enterprises will spend 95% of their database dollars on probably three companies in the future,” Verma says. “And we want to be one of them.”

Recent moves

SingleStore has hired two Microsoft veterans to lead engineering and product development. Shireesh Thota joins as SVP of engineering to oversee development of the company’s multi-model SQL database. And Yatharth Gupta head ups product management/design as VP of product management.
In an expanded partnership, IBM has agreed to license and support the SingleStore database. SingleStore was already available via IBM’s Cloud Pak for Data and in the Red Hat Marketplace. IBM has also become an investor in SingleStore.
Last September, SingleStore announced $80 million in Series F funding. Investors include Dell, HPE, and Google Ventures, among others.

As you can see, SingleStore is associating itself with some of the biggest names in enterprise tech. While that doesn’t assure success, it certainly lends credibility to its unified database proposition and strategic direction.

All of which serves as the backdrop for my conversation with Raj Verma.

Key topics from the interview include:

The rebranding of SingleStore
How 'Database 3.0' is different from earlier eras
What is data intensity?
All-purpose databases vs. purpose-built DBMS's
What organizations can do to simplify database sprawl
Rethinking the post-pandemic workplace
What’s next for SingleStore

Quotes from the podcast:

“Our mission is very simple. It is to unify and simplify modern data.”
“The volume, variety, and velocity of data just inundated enterprise organizations.”
“We feel the future will belong to a database that can combine a vast majority of workloads in a hybrid, multi-cloud environment.”
“The personality of data is ever evolving.”
“The shelf life of data is going down dramatically, and the volume is increasingly. So without speed, you're going to be done — you know what I mean?”
“This convergence of databases is a foregone conclusion, in my opinion....I am fairly confident that there will be a massive consolidation in the database space.”

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit clouddb.substack.com

View Details

This audio article was originally published by the Cloud Database Report on March 2, 2022.

Gartner’s Magic Quadrant has long served as a proof point of a vendor’s relevance in its respective market. But what about those that don’t make it into the quadrant? Here are my observations about six key players—DataStax, Micro Focus, MongoDB, Neo4j, Yellowbrick Data, and Yugabyte—that were not included in Gartner’s Cloud Database MQ for 2021.

You can listen here, or read the full story below.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit clouddb.substack.com

View Details

Many of Teradata's customers continue to manage enterprise data warehouses on premises, while transitioning to cloud services over months or years. Yajnik is responsible for Teradata’s product transformation to the cloud, which is a high priority as the company repositions its data warehouse platform for use in hybrid and multi-cloud environments.

Over the past few months, Teradata has struck industry partnerships with AWS and Microsoft Azure. Recent customer announcements include Telefonica, Volkswagen, and Tesco.

Key topics from the interview include:

Teradata's priorities for the year ahead
Strategic collaboration with AWS on product development and integration of Vantage on AWS
Expanding use of AI & ML in Teradata environments
Customer projects, including Volkswagen for smart factories
What Teradata is doing to enable increased data sharing
Teradata’s core strengths in this fast-changing competitive market

Quotes from the podcast:

"What we are embarking on is to make this whole multi-cloud journey much more intelligent and not so accidental for our customers."
"Our customers require a unified architecture from both companies [Teradata and AWS] in order to modernize and build their data and analytics platform."
"We are seeing a ton of interest in the analytics roadmaps, especially in the context of these industry data models."
"We've seen customers go to competitors, hit a brick wall in terms of their scaling needs, and come back to Vantage."
"Not all analytics are created equal."

View Details

2021 was a busy year for cloud databases, with startups like Cockroach Labs, DataStax, and SingleStore challenging larger, established vendors like Oracle, IBM, and SAP. And of course the Big 3 cloud providers - Microsoft, AWS, and Google Cloud.

There’s a lot of momentum carrying into 2022. A few observations on products and platforms.

First, I expect we will see more Exabyte-size databases, which are 1,000 times larger than the petabyte databases that many businesses operate today. We’re moving into the realm of extreme data, and that’s going to require even greater scalability than most companies are experienced with. That will be a challenge.
Second, database migrations from on-premises systems to the cloud will continue to be a major trend, and not always an easy one, which will require new tools and services. Database migrations can actually take weeks and even months to complete.
Third, database management is getting easier. Cloud database providers have begun offering fully managed services, "serverless" capabilities, and autonomous databases, all of which reduce the amount of provisioning and hands-on management required.
And finally, more business people will begin to pay attention to who has access to data and where data is stored, which means conversations about governance and data distribution will become more of a line of business conversation.

A few comments about the competitive landscape. I see 3 major trends.

"Immovable objects meet irresistible forces." Immovable objects are the deeply rooted vendors like Oracle and IBM, and irresistible forces are the cloud-native startups. These emerging companies are coming on strong, and the old guard must continue reinventing themselves.
The Big 3 cloud providers are the new center of gravity for data management. AWS, Google Cloud, and Microsoft Azure have momentum with their portfolios of purpose-built databases, and other cloud services like analytics and AI.
And last, Snowflake, with its data cloud model, has leap frogged old style centralized data warehouses. I expect more database providers to offer their own Snowflake-like services.

For more on the latest trends in the cloud database market, register for Acceleration Economy's Cloud Database Battleground on January 27, 2022. The digital event will be hosted by John Foley, editor of the Cloud Database Report and database analyst with Acceleration Economy. Registration is free.

Participating companies include Couchbase, Cockroach Labs, DataStax, Redis, SingleStore, and Yugabyte. Each vendor will answer the same five questions:

How does your database help organizations manage data at scale and speed to lead their industry?
When customers talk about becoming a data-driven organization and creating new revenue streams with data, how do you help them make that a reality?
What are the top reasons developers and IT teams want to use your cloud database for the first time?
In what ways does your cloud database simplify data distribution and sharing across hybrid, multi-cloud, and edge environments?
How does your cloud database provide a trusted data environment through access, security, privacy, and governance controls?

View Details

Ocient is a software startup that specializes in complex analysis of the world's largest datasets. Early adopters are hyperscale web companies and enterprises that need to analyze data sets of billions or trillions of records.

Prior to Ocient, Gladwin was the founder of object storage vendor Cleversafe, acquired by IBM in 2015. That experience with mega-size data storage carried over to Ocient, whose software is optimized to run on NVMe solid state storage, industry standard CPUs, and 100 GB networking.

John Foley is editor of the Cloud Database Report and senior analyst with Acceleration Economy.

Key topics from the interview include:

Ocient is focused on very large datasets—petabytes, exabytes, and trillions of rows of data
Leading uses cases include digital ad auctions, telecom network traffic, vehicle fleets
Ocient uses a computer adjacent architecture with storage and compute in the same tier
Ocient is available on premises, in the cloud, and as a managed service
What’s ahead for Ocient in 2022

Quotes from the podcast:

"Our focus is on complex analysis of at least hundreds of billions of records, if not trillions or tens of trillions or hundreds of trillions. That's that's territory that was previously impossible."
"Billions is kind of the last scale at which humans can actually make or touch data that big. It's very hard to do, but it's possible. But at trillions scale, it's just not possible."
"I've challenged people to give me an example of some new technology, some new version of something that makes less data than the version it replaces."
"5g is arguably the largest technology infrastructure investment ever. It's going to create a whole lot more data, at least 10 times the amount of data, for everything."
"What we see is, over time, data analysis is going to occur on these hyperscale systems."

View Details

Yellowbrick Data is a 7-year-old startup that continues to grow in the highly competitive cloud data warehouse market. Yellowbrick recently raised $75 million in its latest round of capital funding as it expands into a variety of industries, including telecom, healthcare, retail, and manufacturing.

Yellowbrick describes itself as a cloud-native data warehouse. It is available for deployment on premises and in hybrid cloud and multi-cloud environments.

Key topics from the interview include:

What make a database or data warehouse cloud native? APIs, open source, storage tiers, networking. How does Yellowbrick define it?
One of the key things with cloud-native data warehouses is the separation of storage and compute. It gives you scalable storage and dynamic compute resources.
Not all approaches to storage/compute are the same. Yellowbrick has published a white paper that defines six different levels of storage/compute separation.
There are performance and workload advantages, but also important considerations around cost.

Quotes from the podcast:

"The separation of storage and compute is table stakes for cloud data warehouses today."
"The ultimate goal is a data warehouse that provides the same cloud experience wherever you need to deploy it for business needs or business reasons. That could be data sovereignty, data gravity, regulations, security, latency and things like that, but provide the same easy-to-consume experience throughout."
"We're addressing two problems: One, software in data warehouses is not as efficient as it could be. And second, there's a lot of unpredictability around the costs of running these systems."
"Democratization of data and analytics is a key trend. And making a self-service experience for line-of-business users is critical."

View Details

With a PhD in Computer Science and Engineering from the Hong Kong University of Science and Technology, Papadopoulos worked as a research scientist at Massachusetts Institute of Technology and Intel Labs prior to launching TileDB. As he explains in this interview, the idea for TileDB originated in that research work in emerging big data systems and the hardware requirements to support those workloads.

Universal databases are not new, but they are re-emerging as an alternative to the single-purpose databases that have become popular in the tech industry.

Key topics from the interview include:

TileDb stores data in multi-dimensional arrays, or matrixes. The data types and workloads it supports.
How TileDB differs from object-relational universal databases of a generation earlier.
How TileDB compares to purpose-built databases – time-series, graph, document, vector, etc.
Use cases and early adopters.
TileDB’s availability as a cloud service and for use on-premises.

Quotes from the podcast:

“These ideas were shaped based on interactions we had with practitioners and data scientists across domains. That was key. We did not delve into the traditional, relational query optimization and SQL operations that other people were doing with different architectures in the cloud."
"I was very drawn to scientific use cases like geospatial and bio-informatics. And it came as a great surprise to me that none of those verticals and applications were using databases."
"Is there a way to build a single storage engine to consolidate this data? A single authentication layer, a single access control layer, and so on. This is how it started."

View Details

Ranganathan discusses the design considerations that influenced development of YugabyteDB, including the learnings gleaned from the engineering team’s previous work at Facebook. YugabyteDB can be deployed on premises or as a cloud service. With built-in replication, YugabyteDB can be used to distribute data across geographic regions in support of data localization requirements and for high availability.

Key topics in the interview include:

The Yugabyte engineering team worked on the HBase and Cassandra databases at Facebook, experience that is now carrying over to the work they are doing at Yugabyte.
How YugabyteDB is different from other distributed SQL databases, including its support for both SQL and NoSQL interfaces.
Common uses cases for Yugabyte DB include real-time transactions, microservices, Edge and IoT applications, and geographically-distributed workloads.
Yugabyte is available via Apache 2.0 license and as self-managed and fully-managed cloud services.

Quotes from the podcast:

“One of the important characteristics of transactional data is the fact that it needs to live forever.”
“We reuse the upper half of Postgres, so it literally is Postgres-compatible and has all of the features.”
“We said we're going to meet developers where they develop. We will support both API's [SQL and NoSQL]. We're not going to invent a new API — that's what people hate.”
“It's not the database that people pay money for; it’s the operations of the database and making sure it runs in a turnkey manner that people really find valuable in an enterprise setting.”

View Details

In this episode of the Cloud Database Report Podcast, editor and host John Foley talks with Ciaran Dynes, Chief Product Officer of Matillion, about the process of integrating and preparing data for cloud data warehouses. Ciaran is responsible for product strategy and incorporating customer requirements into Matillion’s products, which include software tools for data integration and ETL/ELT.

Key topics in the interview include:

ETL, which stands for Extract, Transform, and Load, has been standard practice with on-premise data warehouses for 50 years. But ETL is changing in the cloud because data transformation happens in the cloud data warehouse, after data has been extracted and loaded. This new process is called ELT.
Data must be integrated from myriad sources. Matillion says that many cloud data warehouses pull data from more than 1,000 databases, applications, and other sources.
Data quality is an ongoing challenge, but automation can help.

Quotes from the podcast conversation:

“We’ve moved to this general concept of bronze, silver, and gold versions of data.”
“That’s the game we’re in — can we connect, combine, then synchronize back out into the operational system so we can take an action with a customer in real time?"
“Big data forced organizations to make data a board-level and executive-level conversation.”
“The culture of data is changing rapidly within companies.”

View Details

The adoption of cloud databases is accelerating, driven by business transformation and the need for database modernization.

In this episode of the Cloud Database Report Podcast, founding editor John Foley talks with Andi Gutmans, Google Cloud's GM and VP of Engineering for Databases, about the platforms and technologies that organizations are using to build and manage these new data environments.

Gutmans is responsible for development of Google Cloud's databases and related technologies, including Bigtable, Cloud SQL, Spanner, and Firestore. In this conversation, he discusses the three steps of cloud database adoption: migration, modernization, and transformation. "We're definitely seeing a tremendous acceleration," he says.

Gutmans talks about the different types of database migrations, from "homogenous" migrations that are relatively fast and simple to more complex ones that involve different database sources and target platforms. He reviews the tools and services available to help with the process, including Google Cloud's Database Migration Service and Datastream for change data capture.

Gutmans provides an overview of the "data cloud" model as a comprehensive data environment that connects multiple databases and reduces the need for organizations to build their own plumbing. Data clouds can "democratize" data while providing security and governance.

Looking ahead, Google Cloud will continue to focus on database migrations, developing new enterprise capabilities, and providing a better experience for developers.

View Details

Alexa Weber Morales has years of experience writing about the developer community, cloud infrastructure, and database tools. She had a long career in tech journalism, including as Editor in Chief of Software Development magazine, prior to joining Oracle as a writer, editor, and content strategist.

In this podcast, John Foley, Editor of the Cloud Database Report, talks to Alexa about cloud-native database development, digital transformation, online education, and more. The conversation ranges from Kubernetes to Java development to building applications with Oracle's Apex low-code development platform. Alexa also talks about what motivates and inspires developers.

An interesting note about Alexa — she is also a Grammy-award winning singer, songwriter, and musician known for her work in salsa jazz. In this podcast, Alexa talks about using online learning to write her first symphony.

View Details

Pinecone Systems' new vector database provide similarity search as a cloud service. Use cases include recommendations, personalization, image search, and deduplication of records.

A vector, or vector embedding, is a string of numbers that represents documents, images, or other data. Vectors are used in the development of machine learning applications. A vector database stores, searches, and retrieves the representations by similarity or by relevance.

Pinecone’s vector database is accessed through an API. Early adopters range from startups to large companies with machine learning initiatives that need to scale.

Pinecone Systems’ lead investor was also an early investor in Snowflake, and the similarities don’t stop there.

Cloud Database Report: Recent Episodes

John Foley