CData Foundations Demo: Governed CData semantic layer with Databricks Genie with Andrew Chabot and Eric Tome

View organization page for CData Software

19,270 followers

At Foundations 2025 Andrew Chabot of FinThrive and Eric Tome of Databricks offered a real-world demo of combining a governed CData Software semantic layer with Databricks Genie to accelerate time-to-insight, enable smarter automation, and support rapid experimentation. “Does Databricks understand what's behind? No. Databricks doesn't care. Databricks just knows assets, locations, and sensors right here. And what's great about some of this Data Shop functionality is if I go in here and you click to see Python, it'll pull up this Python right here and pretty much give you the code that you need to run in Databricks to make your connection and actually pull data out of the endpoint and hydrate your lakehouse.” -- Andrew Chabot, FinThrive Access their session in FULL along with other insights from Foundations speakers: https://bit.ly/44K8MbW #CData #CDataFoundations #Databricks #DatabricksGenie #Lakehouse #SemanticLayer #DataVirtualization

To view or add a comment, sign in

More Relevant Posts

Balakrishna Reddy Vathaluri

Data Architect & BI Consultant( Azure Databricks | Cosmos DB | Event Hub | Azure Function | SQL Warehouse| Terraform
3w
Report this post
Databricks is shipping fast on Azure 🚀 A few highlights I’m excited about: Databricks One (Public Preview): a simpler UI that puts AI/BI & apps in one place. Lakeflow Pipelines Editor (Public Preview): Python/SQL file-first pipelines + easier debugging. New system tables: pipeline update history + data classification results for governance. Delta Sharing upgrades: share federated (foreign) tables across workspaces. Databricks SQL upgrades: semantic metadata in metric views, UTF8 collation LIKE, new spatial ST_ExteriorRing, multi-var DECLARE, TEMPORARY metric views, and streaming WITH options. Heads-up: upcoming time-travel/VACUUM behavior changes in DBR 18.0. #AzureDatabricks #DatabricksSQL #UnityCatalog #DeltaLake #Lakehouse #DataEngineering

1 Comment
Like Comment
To view or add a comment, sign in
Tejasri Reddy

Full Stack Data Engineer @ Agoda |SQL |python|Tableau|PowerBI|Azure|AWS|GCP|Pyspark|Snowflake|Airflow|Databricks|Devops| Github | Kafka |ADF |CI|CD
3d
Report this post
Day 1 of #100K Followers DaysOfDataEngineering 🚀 Today, we’re focusing on variables and naming conventions—small things that make a big difference in data engineering. Variables store data, intermediate results, and configuration values, helping you build clean, reusable, and scalable pipelines. But the real magic comes when you name them clearly: ✅ Use descriptive names like customer_count, cleaned_sales_df, or aggregated_orders. ✅ Classes should be in CamelCase: TransactionPipeline, CustomerData. ✅ Constants in uppercase: MAX_RETRIES, DEFAULT_PATH. ✅ Avoid vague names like x or data1—clarity matters for collaboration. Good naming convention isn’t just style—it’s readability, maintainability, and fewer bugs makes sense to developers on the other hand to replicate. In large pipelines, clear variable names help your teammates understand your logic instantly, and they make debugging a lot easier. Remember, consistent naming today saves hours of headaches tomorrow! 💡 #Python #DataEngineering #BestPractices #ETL #DataPipelines #CleanCode #LearningJourney #100DaysOfDataEngineering
Like Comment
To view or add a comment, sign in
Shrijan Technologies

782 followers
2w
Report this post
Pandas or PySpark… which one should you ACTUALLY use? Every data engineer has asked this question at least once. Let’s break it down in a real, no-fluff way- 𝐏𝐚𝐧𝐝𝐚𝐬 = Fast, simple, perfect for small to medium data 𝐏𝐲𝐒𝐩𝐚𝐫𝐤 = Distributed, scalable, built for BIG data The smartest teams? They use BOTH strategically. In this carousel, we’ll show you: ➡ When Pandas is the right choice ➡ Where Pandas fails ➡ Why PySpark saves the day ➡ And the BEST hybrid approach used by top companies in 2025! Want scalable, high-performance data pipelines? That’s exactly what we build at #ShrijanTechnology Check the slides and tell us in the comments: What are YOU currently using — Pandas or PySpark? #Pandas #PySpark #BigData #DataEngineering #TechInsights #ShrijanTech #Scalability #Python

4 Comments
Like Comment
To view or add a comment, sign in
Sailesh Krishnan

AI & Analytics Engineer | Global Experience (Ireland) | Ex–NatWest | SQL • Python • ML • GenAI • Data Engineering | MS in AI (DCU, Ireland)
2w
Report this post
What Actually Happens Inside a Data Pipeline – a Simple Breakdown Without clean, standardised data, dashboards don’t provide any value. The real impact starts with a well-structured data pipeline. When I first started working with data, the term “data pipeline” sounded intimidating, like something only big tech companies handled. After building pipelines for AI automation and BI dashboards, I realized it’s really just a systematic flow of data from source to insight. #DataEngineering #DataAnalytics #AIAutomation #DataPipelines #PowerBI #Python #Azure #ETL #AnalyticsEngineering #GenerativeAI Here’s a simple breakdown: 👇

2 Comments
Like Comment
To view or add a comment, sign in
Nisarg Kacha

Aspiring Data Engineer | Python | MySQL | PySpark | Databricks | Azure | Snowflake || CSE Grad
2w
Report this post
🧠💭 It’s been a while since I dropped some data talk here… Truth is — I haven’t been building new pipelines lately 😅 But guess what? Sometimes the best way to level up isn’t by doing more — it’s by thinking smarter. Lately, I’ve been revisiting the basics — data modeling, architecture flow, and why a small design choice can make or break an entire system and building the strongest core on my Database and coding skills. Crazy how revisiting fundamentals can give you new insights, right? Next on my radar 👇 ⚙️ Creating and Optimizing data pipelines ⚡ Try to work on real time streaming data 🎯 Building scalable systems (without losing sleep 😴) So yeah — a little quiet, but definitely cooking something behind the scenes 🍳 How do you usually reset your learning mode when you hit pause? #DataEngineering #LearningJourney #Python #Azure #CareerGrowth #TechHumor #DataTalk
Like Comment
To view or add a comment, sign in
Abhishek Radhakrishnan

Data Analyst | Full Stack
2w
Report this post
A Roadmap to different opportunities in the world of Data Science. Start with fundamentals the basics in Math, Python, SQL and version control. Then pick a path to specialize in data engineering, data analytics or machine learning. The map shows how these pieces connect and why crossing between them matters. Data engineering sets up the data storage and processing, data analytics turns data into actionable insights, and machine learning adds predictive power. Deployment brings models and dashboards into production and real world use. The goal is to become a true data science expert who can own end to end solutions. Which track are you focusing on this year and how will you connect the dots across the stack to deliver real impact? More Details: https://lnkd.in/eXWi7s-G #DataScience #DataEngineering #DataAnalytics #MachineLearning #Deployment #CareerPath #FullStackDataScience
Like Comment
To view or add a comment, sign in
Rahul Pal

Budding Data Engineer | Python & SQL |AI &ML| Cloud Computing Enthusiast | Exploring ETL Pipelines & Big Data Tools
1w
Report this post
🐼 Pandas vs PySpark — Same Goals, Different Scales! ⚡ Every data engineer or data analyst hits this moment — When your Pandas code runs perfectly on small data… but then you try the same on millions of rows 😅 That’s where PySpark steps in — same logic, but built to handle massive-scale data with distributed computing. Here’s the real deal 👇 📊 Pandas → Best for small to medium datasets, quick exploration, local analysis. 🔥 PySpark → Built for big data, parallel processing, and cluster environments. In short: ➡ Start with Pandas to understand data. ➡ Move to PySpark when your laptop fan starts sounding like a jet engine 🚀 #Pandas #PySpark #DataEngineering #BigData #DataAnalytics #MachineLearning #Python #Spark #ETL #DataScience #AnalyticsLife #DataFrame #Coding #learning
Like Comment
To view or add a comment, sign in
Muhammad Haris Ali
3w Edited
Report this post
𝗪𝗵𝗲𝗻 𝘀𝗵𝗼𝘂𝗹𝗱 𝘆𝗼𝘂 𝘂𝘀𝗲 𝗣𝘆𝗦𝗽𝗮𝗿𝗸 𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝗼𝗳 𝗣𝗮𝗻𝗱𝗮𝘀 ? While i was working in 𝗝𝘂𝘀𝘁𝗔𝗱𝘀 we needed to generate millions of ad creative copies from massive XML files in gigabytes. We started with Pandas. At first, it worked. But as the data kept growing, our server memory began maxing out. Processing that should have taken minutes was running into hours, and scaling further felt impossible. That’s when team realized Pandas wasn’t built for this scale. 𝗪𝗶𝘁𝗵 𝗣𝘆𝗦𝗽𝗮𝗿𝗸: • The workload was distributed across a cluster • We could process huge XML files without memory bottlenecks • Generating creatives became much faster and more reliable 𝗞𝗲𝘆 𝗹𝗲𝘀𝘀𝗼𝗻 𝗳𝗿𝗼𝗺 𝘁𝗵𝗮𝘁 𝗽𝗿𝗼𝗷𝗲𝗰𝘁: • Pandas is best for smaller datasets that fit in memory, useful for exploration and prototyping • PySpark is built for large-scale, distributed processing of gigabytes to terabytes in production workloads #𝗣𝘆𝗦𝗽𝗮𝗿𝗸 #𝗣𝘆𝘁𝗵𝗼𝗻
Like Comment
To view or add a comment, sign in
Vidya Sagar Mekapothula

Data Science Professional | Python, Machine Learning & AI | Seeking Machine Learning Consultant Roles
1w
Report this post
Predictive Anomaly Detection for Data Center Assets! 🚀 Excited to share a project that proactively identifies hardware anomalies using simulated data center telemetry and powerful machine learning! This solution employs Isolation Forest and a Keras Autoencoder for effective anomaly detection, paving the way for better predictive maintenance. 💡 The project includes robust feature engineering (lag/rolling features) and clear training scripts. Quick Start pip install -r requirements.txt Run src/generate_synthetic.py, then src/preprocess.py, and finally src/train_and_evaluate.py. 🔗 GitHub Repository: https://lnkd.in/gmr_U9g5 🔗 Live Streamlit App: https://lnkd.in/gtRhuWKB Check it out and let me know your thoughts! 👇 #AnomalyDetection #MachineLearning #PredictiveMaintenance #DataCenter #Python

1 Comment
Like Comment
To view or add a comment, sign in
Sam Garcia

Build Better Data Pipelines | Dagster
5d Edited
Report this post
As year end approaches, many data teams are beginning to decide which projects will make their 2026 roadmaps. If improving on legacy orchestration is on your radar, check out this blog from Eric Thomas
Dagster Labs

15,550 followers
1mo

Eric Thomas took an excellent lakehouse tutorial built with Airflow and rebuilt it with Dagster. The stack is the same: MinIO, Trino, Iceberg, and dbt. The orchestrator is different. The results were striking: → Event-driven sensors replaced time-based scheduling. Pipelines run when data arrives, not on a clock. → Smart partitioning enabled backfills and selective reruns. No more all-or-nothing processing. → Asset checks created multi-layered quality validation. Data quality became programmatic, not just hoped for. → Pure SQL patterns eliminated Python bottlenecks. Trino handles the heavy lifting. The Lakehouse provides the foundation, but the orchestration layer determines how effectively teams can actually use it. The original tutorial teaches lakehouse fundamentals beautifully. This comparison shows how much orchestration choice matters for production readiness. Check out the full blog today! Link in the comments
Like Comment
To view or add a comment, sign in

19,270 followers

View Profile Connect

CData Foundations Demo: Governed CData semantic layer with Databricks Genie with Andrew Chabot and Eric Tome

More from this author

Announcing the CData Connect AI Hackathon: Build the Future of Connected AI

Foundations 2025: Equipping Data Leaders for the AI Shift

Google A2A Engineering Team Added as Closing Keynote at Foundations 2025 CData Software

Explore content categories