Introducing marimo check: a linter for data apps and pipelines

This title was summarized by AI from the post below.

5,593 followers

2mo

Introducing `marimo check`: a linter that gives agents the feedback they need to write production-ready data apps and pipelines, as well as exploratory notebooks, with marimo. https://lnkd.in/gyFYz56H Companies large and small are replacing legacy data tools with AI-native, workflows built on marimo: with agents, marimo lets companies go instantly from prompt to data app, pipeline, LLM eval notebook, and more . Developers and agents prefer marimo to Jupyter because it's code-first and productionizable, while agents wielding marimo make Python accessible to non-technical users who were previously limited by Looker's inflexible UI. Previously, agents like Claude Code could already execute marimo notebooks as scripts to get feedback on the correctness of their code. With `marimo check`, agents can get additional feedback without running a single line of code, empowering them to self-correct on the fly. In the coming weeks, we will have a lot more to say about how marimo is uniquely well-suited to empower both agents and humans to rapidly create production-ready data apps and pipelines ... see our latest newsletter for a sneak peek (https://lnkd.in/geQ85iXy).

1 Comment

Thierry Jean

2mo

Great idea! The first I saw custom linting applied was in Metaflow and it saved me so much dev time

3 Reactions

To view or add a comment, sign in

More Relevant Posts

Daniel Kats
2mo
Report this post
You probably haven’t heard much yet about Claude Code hooks—but you should. They let you inject custom shell logic around Claude’s operations, making your AI assistant behave more like a trusted teammate than a creative wildcard. With a PreToolUse hook, you can enforce strict guardrails—for example, only allow terraform plan, never permit terraform apply. You can outright block dangerous commands like tf apply or rm -rf before Claude ever runs them. After Claude issues a tool command, a PostToolUse hook can run linters, unit tests, or cleanups (run_in_background is an option to avoid blocking your next prompt) . For asynchronous tasks, use Notification hooks. Want a ping on Discord or Slack when a heavy analysis is done? You can wire it through a shell command. And because hooks are JSON-configured (and scopeable per project), you can commit them alongside your code so the whole team gets the same AI constraints. This isn’t just convenience—it’s about making behaviour deterministic. Rather than relying on Claude to “remember” to lint or test, hooks give you control, repeatability, and auditability. Caution: hooks execute shell commands, so you must treat them like you would any script with permissions. Also, some users report flakes in hook triggering (especially under WSL2) or broken blocking semantics in PreToolUse. Read more: https://lnkd.in/gDpUJTH3

Hooks reference - Claude Docs docs.claude.com
Like Comment
To view or add a comment, sign in
Jacopo Tagliabue
1mo
Report this post
Automation in data engineering lags behind software, as data engineering is indeed less about coding per se and more about the DevOps-y surroundings. Case in point: NumPy 2.0 drops and we all woke up to pandas pipelines breaking for no obvious reason. So what would it take for #AI agents to be useful here? We’re releasing our first attempt at mapping the abstractions for the agentic #lakehouse, using “self-repairing pipelines” as a canary test for automation in high-stakes data workflows. We build on top of bauplan APIs (which provide the underlying safety guarantees), wrapped by FastMCP, and use #smolagents (Hugging Face) for a no-nonsense, 20-line loop. The lesson: good software practices and concise APIs are the biggest enablers of automating the DevOps-y data chores; in other words, the bottleneck here is not model intelligence, but a truly programmable lakehouse. As Alan used to say: "We can only see a short distance ahead, but we can see plenty there that needs to be done". 👉 "Safe, Untrusted, Proof-Carrying Al Agents" by me and Ciro Greco is now available on arxiv: https://lnkd.in/egsam5Yx 👉 #opensource code: https://lnkd.in/epYQFzbU 👉 Video walkthrough for the lazy coders: https://lnkd.in/eTF7tWHG See you, #agentic cowboys! #agent #lakehouse #faas #gitfordata #dataengineering
2 Comments
Like Comment
To view or add a comment, sign in
Prithvi Salve
1mo
Report this post
🚀 The AI Revolution in Data Engineering is HERE! 🚀 🔹️I've been integrating GitHub Copilot into my recent data engineering projects, and it's fundamentally changed the way I work. 🔹️This tool is truly an AI pair programmer, driving massive gains in efficiency and code quality. Here’s how Copilot is transforming the data engineering lifecycle: * Simplified Coding & Reduced Effort: From generating complex SQL and Python ETL scripts to writing documentation strings, Copilot handles the boilerplate and repetitive code instantly. This has drastically simplified coding and significantly reduced the manual effort needed, allowing me to focus on business logic and architecture design. ▶️ Specific Use Case: Delta Lake Transformation Working on our Databricks ETL pipeline, I had a complex PySpark task to standardize customer data. Instead of manually writing the entire block, I leveraged Copilot. 🔹️My Copilot UI Prompt: # PySpark: Add a 'full_name' column by concatenating 'first_name' and 'last_name', # then drop the original columns and write the resulting DataFrame as a Delta table # partitioned by 'country' in overwrite mode. Copilot instantly generated the complete, optimized PySpark code using withColumn, concat_ws, drop, and the final write.format('delta').partitionBy('country').mode('overwrite').save() command. This single prompt saved me several minutes of detailed, multi-step coding, streamlining our data transformations. 1. Instant Error Solving: It’s an exceptional tool for debugging. By providing highly contextual code suggestions, Copilot helps prevent common mistakes. Furthermore, with Copilot Chat, I can quickly analyze complex runtime errors and receive suggestions for immediate fixes, turning frustrating debugging sessions into swift, iterative error-solving. 2. Optimized, High-Quality Code: Copilot often suggests more idiomatic and performant code snippets. It acts as an instant refactoring assistant, leading to cleaner, more maintainable, and ultimately optimized code that runs faster, which is critical for large-scale data pipelines. If you are a Data Engineer looking to maximize your developer productivity and ensure your code is efficient and clean, leveraging an AI assistant like Copilot is a non-negotiable step forward. What's your experience been? Drop a comment below! #DataEngineering #GitHubCopilot #AI #Productivity #Python #PySpark #Databricks #DeltaLake #ETL
5 Comments
Like Comment
To view or add a comment, sign in
Miles Garvey
1mo Edited
Report this post
So I went down a little data rabbit hole 🐇... with a new open-source data language called "Malloy" (created by Looker’s co-founder lloyd tabb). After joining the Malloy Slack, I found out the core team is already working on materializations to make the open-source language more robust for warehousing transformations. So (while they are doing that), here is what I did in the meantime: 1. Built a clean dbt project transforming data between RAW, STAGING, and GOLDEN Schema. 2. Then rebuilt the entire thing in Malloy to do the transformations instead of dbt yml. 3. Compared both approaches, side by side. Same data. Two philosophies: dbt = SQL pipelines and YAML glue. Malloy = Semantic Modelling meets Composable queries. You can watch me over-explain it on the FULL VIDEO HERE: https://lnkd.in/gQ5tb3Hh. (And yes, if you like what you see.. please subscribe, before I start explaining data as an interpretive dance.) Have the projects on my git repo as well (Let me know if you want the link). Also, if you are interested in trying out this open-source language you can visit: https://lnkd.in/gNybg5Yj to learn more. I'll be posting Part 2 soon as well, where I will take the outputs and build a semantic model using Credible. #semantic #semantics #malloy #sql #dbt #data #dataanalytics #dataengineering #ai #llm #model #etl #elt

1 Comment
Like Comment
To view or add a comment, sign in
Kyle Nesbit
1mo Edited
Report this post
Great to get the first few glimpses of transformation and materialization in Malloy This gives AI & data application developers and SMBs simple yet powerful tools to do the kinds of transformations that, until now, were only accessible to data engineers working in large, complex, and costly data lake or warehouse environments. Super excited to see where this goes. 🚀 👇 Check out Miles’ deep dive comparing Malloy with dbt — same data, two philosophies: dbt → SQL pipelines + YAML glue Malloy → Semantic modeling + composable queries

Miles Garvey

Human | Data Advisor | Speaker | ex-G2 Data Leader | Semantic Specialist
1mo Edited

So I went down a little data rabbit hole 🐇... with a new open-source data language called "Malloy" (created by Looker’s co-founder lloyd tabb). After joining the Malloy Slack, I found out the core team is already working on materializations to make the open-source language more robust for warehousing transformations. So (while they are doing that), here is what I did in the meantime: 1. Built a clean dbt project transforming data between RAW, STAGING, and GOLDEN Schema. 2. Then rebuilt the entire thing in Malloy to do the transformations instead of dbt yml. 3. Compared both approaches, side by side. Same data. Two philosophies: dbt = SQL pipelines and YAML glue. Malloy = Semantic Modelling meets Composable queries. You can watch me over-explain it on the FULL VIDEO HERE: https://lnkd.in/gQ5tb3Hh. (And yes, if you like what you see.. please subscribe, before I start explaining data as an interpretive dance.) Have the projects on my git repo as well (Let me know if you want the link). Also, if you are interested in trying out this open-source language you can visit: https://lnkd.in/gNybg5Yj to learn more. I'll be posting Part 2 soon as well, where I will take the outputs and build a semantic model using Credible. #semantic #semantics #malloy #sql #dbt #data #dataanalytics #dataengineering #ai #llm #model #etl #elt
Like Comment
To view or add a comment, sign in
Abhigyan Sarma
1mo
Report this post
Good to see new semantic modelling tools coming up, that are open source. Malloy seems promising! dbt Labs watch out. 👀

Miles Garvey

Human | Data Advisor | Speaker | ex-G2 Data Leader | Semantic Specialist
1mo Edited

So I went down a little data rabbit hole 🐇... with a new open-source data language called "Malloy" (created by Looker’s co-founder lloyd tabb). After joining the Malloy Slack, I found out the core team is already working on materializations to make the open-source language more robust for warehousing transformations. So (while they are doing that), here is what I did in the meantime: 1. Built a clean dbt project transforming data between RAW, STAGING, and GOLDEN Schema. 2. Then rebuilt the entire thing in Malloy to do the transformations instead of dbt yml. 3. Compared both approaches, side by side. Same data. Two philosophies: dbt = SQL pipelines and YAML glue. Malloy = Semantic Modelling meets Composable queries. You can watch me over-explain it on the FULL VIDEO HERE: https://lnkd.in/gQ5tb3Hh. (And yes, if you like what you see.. please subscribe, before I start explaining data as an interpretive dance.) Have the projects on my git repo as well (Let me know if you want the link). Also, if you are interested in trying out this open-source language you can visit: https://lnkd.in/gNybg5Yj to learn more. I'll be posting Part 2 soon as well, where I will take the outputs and build a semantic model using Credible. #semantic #semantics #malloy #sql #dbt #data #dataanalytics #dataengineering #ai #llm #model #etl #elt
Like Comment
To view or add a comment, sign in
Mohammed Rasif
1mo
Report this post
🚀 Understanding DAGs – The Backbone of Modern Data Workflows In the world of data engineering and workflow orchestration, one concept stands tall — the DAG (Directed Acyclic Graph). A DAG is not just a fancy graph theory term — it’s a powerful abstraction that ensures our data pipelines and processes are deterministic, dependency-aware, and fault-tolerant. Let’s break it down 👇 🔹 Directed → Every task has a clear direction (A → B → C). 🔹 Acyclic → No circular dependencies. A cannot depend back on Task C. 🔹 Graph → A collection of nodes (tasks) and edges (dependencies). In tools like Apache Airflow, Prefect, and Dagster, a DAG represents how data flows — each node being a task, and each edge defining when and how that task runs. 💡 Why DAGs Matter: ✅ Clarity: You can visualize complex data pipelines easily. ✅ Resilience: Failures are isolated; recovery is predictable. ✅ Scheduling Power: Dependencies dictate execution order — not timing alone. ✅ Scalability: Modular design makes pipelines easy to extend. For example, in Apache Airflow, every DAG is a Python script defining a set of tasks with explicit dependencies — ensuring reproducibility and consistency across runs. Here’s a simple visualization of how it works: 🧩 Extract → Transform → Load Each step is a task. The DAG ensures "Transform" only starts after "Extract" succeeds. In short, DAGs bring structure to automation — turning chaos into order, and giving engineers a way to orchestrate thousands of data jobs with confidence. If you’re working in data engineering, MLOps, or ETL pipelines, mastering DAGs is not optional — it’s foundational. 💬 What tools do you use for DAG orchestration — Airflow, Prefect, or something custom? Would love to hear your thoughts below 👇 #DataEngineering #Airflow #DAG #ETL #BigData #Python #Automation #MLOps #Prefect #Dagster #snowflake #databricks #AI #ML
Like Comment
To view or add a comment, sign in
Kowshiq Kattamuri 🇮🇳
1mo Edited
Report this post
𝗜𝗳 𝗣𝗮𝗻𝗱𝗮𝘀 𝘄𝗮𝘀 𝘆𝗲𝘀𝘁𝗲𝗿𝗱𝗮𝘆’𝘀 𝗰𝗼𝗺𝗳𝗼𝗿𝘁, Polars 𝗶𝘀 𝘁𝗼𝗱𝗮𝘆’𝘀 𝗻𝗲𝗰𝗲𝘀𝘀𝗶𝘁𝘆. In other words, 𝗣𝗮𝗻𝗱𝗮𝘀 𝗺𝗮𝗱𝗲 𝗱𝗮𝘁𝗮 𝗵𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝘀𝗶𝗺𝗽𝗹𝗲. Polars 𝗶𝘀 𝗺𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝘀𝗺𝗮𝗿𝘁. In the world of data, everyone’s chasing milliseconds. And lately, one name has been quietly redefining what “fast” really means and that is Polars. For years, 𝗣𝗮𝗻𝗱𝗮𝘀 𝘄𝗮�� 𝘁𝗵𝗲 𝗰𝗼𝗺𝗳𝗼𝗿𝘁 𝘇𝗼𝗻𝗲 𝗳𝗼𝗿 𝗮𝗹𝗺𝗼𝘀𝘁 𝗲𝘃𝗲𝗿𝘆 𝗱𝗮𝘁𝗮 𝗽𝗿𝗼𝗳𝗲𝘀𝘀𝗶𝗼𝗻𝗮𝗹. It worked, until our data outgrew it. The sluggish joins, high memory usage, and single threaded performance started showing their age. Then came Polars, built on 𝗥𝘂𝘀𝘁, with one simple goal and that is to make data processing faster, safer, and scalable. And it’s not just hype anymore. 𝗧𝗵𝗲 𝗮𝗱𝗼𝗽𝘁𝗶𝗼𝗻 𝘀𝘁𝗼𝗿𝗶𝗲𝘀 𝗮𝗿𝗲 𝘀𝗽𝗲𝗮𝗸𝗶𝗻𝗴 𝗳𝗼𝗿 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀. ➡️ In one of the best reads I’ve come across recently, Paul Duvenage shared how moving from 𝗣𝗮𝗻𝗱𝗮𝘀 𝘁𝗼 𝗣𝗼𝗹𝗮𝗿𝘀 resulted in 𝗰𝗹𝗲𝗮𝗻𝗲𝗿 𝗘𝗧𝗟𝘀, 𝘀𝗶𝗺𝗽𝗹𝗲𝗿 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀, and 𝗳𝗮𝘀𝘁𝗲𝗿 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 (https://lnkd.in/g9RCZvtr). ➡️ Yuki Kakegawa backed this up with real world benchmarks where 𝗣𝗼𝗹𝗮𝗿𝘀 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝘁𝗹𝘆 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗲𝗱 𝗣𝗮𝗻𝗱𝗮𝘀 across data sizes and workloads (https://lnkd.in/gsm3hRQd). ➡️ Even Accel's article described it beautifully: Polars is transforming data processing on modern hardware, 𝗯𝗿𝗶𝗱𝗴𝗶𝗻𝗴 𝘁𝗵𝗲 𝗴𝗮𝗽 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲 𝗮𝗻𝗱 𝘀𝘆𝘀𝘁𝗲𝗺 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 (https://lnkd.in/g-JBwHuY). With support for 𝗹𝗮𝘇𝘆 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗺𝘂𝗹𝘁𝗶 𝘁𝗵𝗿𝗲𝗮𝗱𝗶𝗻𝗴, and 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗮𝗯𝗹𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝘂𝘀𝗲, Polars is redefining what “𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁” means in data engineering. 🌟 And just when you think the momentum around Polars couldn’t get any stronger, the news drops that 𝗶𝘁’𝘀 𝗿𝗮𝗶𝘀𝗲𝗱 $𝟭𝟬𝟬 𝗺𝗶𝗹𝗹𝗶𝗼𝗻 𝗶𝗻 𝗦𝗲𝗿𝗶𝗲𝘀 𝗔 𝗳𝘂𝗻𝗱𝗶𝗻𝗴 (https://lnkd.in/gUp77T23). Polars is a reminder that when you build with clarity and conviction, the world eventually catches up. ‼️ Now that you’ve come this far, do check out this brilliant article by Marco Gorelli on an important yet often 𝘂𝗻𝗻𝗼𝘁𝗶𝗰𝗲𝗱 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 Polars 𝗮𝗻𝗱 𝗣𝗮𝗻𝗱𝗮𝘀, the 𝗴𝗿𝗼𝘂𝗽 𝗯𝘆 𝗯𝗲𝗵𝗮𝘃𝗶𝗼𝗿: https://lnkd.in/gvWS2mUy. #Polars #Pandas #RustLang #Python #DataEngineering #DataScience #BigData #HighPerformanceComputing #ModernDataStack #DataProcessing #ParallelComputing #DataAnalytics #ETL #QueryOptimization #Developers #FutureOfData #OpenSource
1 Comment
Like Comment
To view or add a comment, sign in
William Figueroa
1mo
Report this post
🚀 𝐏𝐨𝐰𝐞𝐫𝐟𝐮𝐥 𝐎𝐩𝐞𝐧-𝐒𝐨𝐮𝐫𝐜𝐞 𝐓𝐨𝐨𝐥𝐬 𝐭𝐨 𝐒𝐮𝐩𝐞𝐫𝐜𝐡𝐚𝐫𝐠𝐞 𝐘𝐨𝐮𝐫 𝐀𝐈 & 𝐃𝐚𝐭𝐚 𝐏𝐫𝐨𝐣𝐞𝐜𝐭𝐬 From building full-stack AI apps in Python to enriching datasets automatically, these tools make LLM development and data workflows faster and smarter. ☺ 𝟏. 𝐑𝐞𝐟𝐥𝐞𝐱 𝐁𝐮𝐢𝐥𝐝 – 𝐅𝐮𝐥𝐥-𝐒𝐭𝐚𝐜𝐤 𝐋𝐋𝐌 𝐀𝐩𝐩𝐬 𝐢𝐧 𝐏𝐮𝐫𝐞 𝐏𝐲𝐭𝐡𝐨𝐧 Imagine creating an entire AI web app — backend and frontend — with a single prompt. That’s what Reflex Build lets you do. ○ Describe your app in natural language ○ Auto-generates Python code for both frontend & backend ○ Live preview updates instantly ○ Integrates seamlessly with Python libraries & external APIs Perfect for data scientists and ML engineers who want to go from prototype to production—all in Python. ► Try it here: https://build.reflex.dev/ ☺ 𝟐. 𝐌𝐢𝐭𝐨 𝐀𝐈 – 𝐘𝐨𝐮𝐫 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐭 𝐀𝐬𝐬𝐢𝐬𝐭𝐚𝐧𝐭 𝐢𝐧 𝐉𝐮𝐩𝐲𝐭𝐞𝐫 𝐍𝐨𝐭𝐞𝐛𝐨𝐨𝐤𝐬 Mito AI brings natural language and automation to your notebooks. ► Key Features: ○ Generate data pipelines automatically from prompts ○ Debug contextually inside your notebook ○ Convert Excel operations into clean Python code ○ Query databases using plain English Open source, intuitive, and productivity-boosting for every data workflow. ► GitHub: https://lnkd.in/eZ-HqjQ5 ☺ 𝟑. 𝐅𝐢𝐫𝐞𝐄𝐧𝐫𝐢𝐜𝐡 – 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦 𝐄𝐦𝐚𝐢𝐥 𝐋𝐢𝐬𝐭𝐬 𝐢𝐧𝐭𝐨 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐃𝐚𝐭𝐚𝐬𝐞𝐭𝐬 Turn a plain list of emails into a rich, structured dataset with FireEnrich, a multi-agent AI system. Here’s what happens under the hood: ○ Discovery Agent → Finds basic company info ○ Profile Agent → Identifies industry & market ○ Financial Agent → Gathers funding data ○ Tech Stack Agent → Detects technologies ○ General Agent → Finds leadership details The final output? A verified dataset ready for analytics or enrichment — all powered by LLMs. ► GitHub: https://lnkd.in/eHjsGekZ 💡 Whether you’re building AI products, cleaning messy data, or exploring agent architectures — these open-source tools are worth a look. #AI #LLMs #DataScience #Python #OpenSource #Jupyter #MachineLearning #Agents #DataEngineering #AIApplications

GitHub - firecrawl/fire-enrich: 🔥 AI-powered data enrichment tool that transforms emails into rich datasets with company profiles, funding data, tech stacks, and more using Firecrawl and multi-agent AI github.com
Like Comment
To view or add a comment, sign in
marimo

5,593 followers
1mo
Report this post
Sumble replaced fragmented Jupyter notebooks and brittle dashboards with marimo. As a fast-growing team building AI-powered account intelligence tools, they needed a better way to turn notebooks and data insights into dynamic internal tools and dashboards. Historically, their team struggled with fragmented data app workflows and throwaway notebooks, leading to painful app deployments and constant maintenance overhead. Today, marimo is the single source of truth for their entire organization, seamlessly powering 25+ internal applications and dashboards, unifying exploration, collaboration, and deployment in one environment. Founder/CEO Anthony Goldbloom believes "There have been a few technologies that have been truly transformational for me...marimo is one of them" Read the full case study: https://lnkd.in/gke-v7ku

Why Sumble replaced Jupyter with marimo, from notebooks to apps marimo.io

2 Comments
Like Comment
To view or add a comment, sign in

5,593 followers

View Profile Connect

Introducing marimo check: a linter for data apps and pipelines

More Relevant Posts

Explore related topics

Explore content categories