July 6, 2023

Here’s What You Missed at the Databricks Data + AI Summit

July 6, 2023

When it comes to data and AI, sometimes the pace of change feels impossible to keep up with. But in some ways, last week’s Databricks’ Data and AI Summit helped things feel slower: as it turns out, AI and machine learning have been on the move for decades. And that means understanding its place in history is key. Sigma joined more than 6,000 other data and machine learning practitioners at last week’s event, where Sigma hosted a session on using input tables to write data back to the cloud data warehouse, which just launched on Databricks.

The event hosted by Databricks featured key announcements from CEO Ali Ghodsi, who announced LakehouseIQ and Lakehouse AI, which will allow companies to build ML models using their own enterprise and proprietary data on Databricks’ platform.

Ghodsi was quick to emphasize the difference between products that have an AI assistant built in, vs. Databricks’ new approach of building an engine in the data platform itself—particularly one that can build models on a company’s proprietary data and learn from it. Databricks’ bet is that as AI assistants become the norm, the industry will continue shifting to a build-your-own model approach.

The Summit made that one thing incredibly clear: machine learning models specific to each enterprise is where the entire industry is headed. “Whoever solves this problem will own the whole data analytics space completely in the future,” Ghodsi said.

At Sigma, Mitch Ertle, Partner Solution Engineer for Databricks, will dive in more next week about the technical aspects of these Databricks announcements , as well as what this means for Sigma going forward.

‍The Convergence of AI and BI

Databricks CEO Ali Ghodsi announced the company’s shift to building ML models on enterprise data at the first keynote on June 28.

Ghodsi also shared that the company’s Unity Catalog, which is the governance layer of the Databricks lakehouse, has become the company’s top priority, since you can store not just data but also ML models within it.

“That’s what enterprises really need, or they can’t lock down the security,” Ghodsi observed at the partner summit event on Tuesday. “We absolutely embrace the ecosystem of helping organizations build their own ML models and their own Large Language Models (LLMs) that they can own, that they can control the weights and IP to.”

Databricks CTO and founder Matei Zaharia compared getting ML models built for enterprises to helping companies get on the web decades ago, describing it as “a unique chance to do that.”

MosaicML—which Databricks recently announced the intent to acquire—led a breakout session on how to easily train and serve AI models on company data in secure environments.

“There’s a perception that building your own AI is expensive and hard,” shared Hanlin Tang, the CTO of Moscaic ML. “We’re trying to change that narrative… Specialized models can have strong business value.” Governance and security are the biggest reasons companies will need their own models, he added.

“People are trying to beat the bad behavior out of the model,” Tang said of large open source models. Instead, “You want to have fine control over the data that goes into your model.”

Lin Qiao, the former head of PyTorch and current CEO of Fireworks, discussed how models can now be customized with much less data than before. “There will not be one model provider dominating the space,” she said. “There will be many.”

As for Sigma, we just announced our AI capabilities with Sigma AI, which you can learn about more here. The feature will include AI-generated input tables for tasks such as sentiment analysis, auto-fill & clean, and classification—as well as natural language workbooks.

The Summit expo floor included live demos from dozens of analytics companies racing to capture the data and AI market.

Marc Andreessen Explained Why Now

At Thursday’s keynote, Marc Andreessen, general partner of the famed venture capital firm Andreessen-Horowitz, explained why now was the moment in history that generative AI has exploded so rapidly.

“We had to get the internet to scale,” he said. “Recent breakthroughs on large language models and open source foundation models like ChatGPT have made what’s happening now possible because we have a lot more data.”

He also spoke to the fears of AI taking over everyone’s jobs, particularly engineering and programming work.

“I think most programmer jobs—most activity—is going to go up a level,” he said. Instead of writing code, programmers will be managing AI that’s writing the code. The roles will essentially change so that programmers will “oversee effectively an army of AI.”

Andreessen remained optimistic on the future of the industry. “It’s a zero sum view of the world—where there’s a certain amount of work to be done, and if machines do the work then humans don’t have anything to do,” Andreessen said. “What ends up happening is actually very much the opposite. When machines can take over things from people, you free people up to do more valuable things… Nobody ever runs out of ideas for what they want software to do. What they run out of is the time and the resources to actually build the software that they want.”

Other keynote speakers made nods to the fact that artificial intelligence is far from new. In fact, it’s been at least an 80-year journey. All the way back in the 1930s and 40s, the idea of artificial intelligence first emerged, and the original neural network paper was written in 1943. That means at least a big part of the industry has been prepared for this moment.

Privacy and Security

Finally, regulation, governance, and privacy were pieces of almost every aspect of the Summit. One of the key policymakers in the EU working on the EU AI Act—Matteo Quattrocchi, Policy Director in EMEA for BSA, The Software Alliance—led a breakout session on the challenges of working on the regulation.

“The biggest difference is what happens with foundation models,” Quattrocchi said about the future of the policy. “The EU AI Act doesn’t necessarily distinguish between open source or not open source. You have to have a degree of control downstream that you can’t really have.”

The EU AI Act is expected to be the first piece of regulation in the AI space, and will give companies several years to comply with any newly established rules.

Eric Schmidt, Google’s former CEO, assured the audience that the biggest risks of AI would be regulated.

Eric Schmidt, Google’s former CEO, also brought a mic drop to AI fears during Thursday’s keynote: “Every single politician that I speak with, every single leader I talk to, is now an expert in AI,” he joked. “And they know nothing about what you all are doing. Maybe that’s always been true, but it’s sort of alarming to me.”

He offered reassurance that AI systems would be regulated against the biggest threats, and said there already are guardrails to constrain models around human values. “The simplest way to frame it is: there are scenarios of extreme risk,” Schmidt said. “And these systems are going to get regulated around extreme risk.”

The Summit’s key speakers made the following clear: while we’ve reached a defining point in building AI systems, building AI on proprietary data will take time. Despite recent advances to build LLMs easily and quickly, the infrastructure that companies come with cannot change overnight.

Download a free trial of Sigma here.

‍

THE STATE OF BI REPORT

Cloud BI

Databricks