Hey folks! I'd like to more formally introduce and talk about a few of the projects we're working on towards improving large-scale bioinformatics analyses (one of which was developed during St. Jude's KIDS24 Biohackathon)! If you're not aware, my team started a project called St. Jude Rust Labs (https://lnkd.in/eXY_9dJ8) where we're rewriting a more foundation for bioinformatics analysis in Rust. This cuts across more than just workflow execution, but, recently, we've been primarily focused on improving the experience of working with the Workflow Definition Language (WDL) at scale. To that end, we've released a number of projects, including: - We've written a complete lexer/parser for WDL v1.2 language (https://lnkd.in/eGWHSM2g). Anyone can use this foundation to build tools for WDL. - We've written a VSCode plugin that includes a language server protocol (LSP) for linting and validation directly in your editor (https://lnkd.in/euApG9tF). - And, announcing today, we've started down the road of writing our own WDL execution engine spread across the Crankshaft (https://lnkd.in/et_RFkA3) and Sprocket (https://lnkd.in/eUTuPE68) projects. Crankshaft was prototyped during the St. Jude Biohackathon last week—it's a _headless_ workflow execution engine, meaning that, in theory, others could come along and write drivers for Crankshaft built on NextFlow, CWL, Snakemake, etc. As I said earlier, our team is really focused on WDL specifically, so we're going to continue building out a "head unit" for Crankshaft using WDL. That being said, I would love to see other community projects popping up and using the core machinery for other workflow languages. Thanks to all of the individuals who participated on our Biohackathon team (Kevin Benton, suchitra chavan, Braden Everson, Andrew Frantz, Michael Gattas, Peter Huene, and John McGuigan)!
Clay McLeod’s Post
More Relevant Posts
-
Sharpening Search Algorithms: Binary & Linear Search Project 🚀" Proud to share my recent project where I implemented Binary Search and Linear Search algorithms as part of my hands-on learning journey! 🖥 The primary goal of this project was to enhance my understanding of core searching techniques and reinforce the fundamentals of data structures. Through this, I gained deeper insights into how algorithms handle sorted and unsorted datasets, their respective time complexities, and why efficiency matters in real-world applications. Key highlights of the project: ✔ Implementation of Linear Search with O(n) complexity. ✔ Development of Binary Search using a divide-and-conquer approach, achieving O(log n) efficiency. ✔ Well-documented code and a detailed README for clarity and ease of understanding. ✔ Explored edge cases and tested scenarios to solidify my learning. Additionally, I shared my work on GitHub, ensuring the repository is accessible and easy to navigate. This project not only strengthened my technical skills but also provided an opportunity to focus on producing high-quality, maintainable code. Check out my GitHub repository here: [Insert Link] Feel free to explore the code and provide your feedback—it’s always great to learn and grow from the community's insights! 💡 Let’s connect and discuss more about algorithm optimization and data structures! 🤝 #BinarySearch #LinearSearch #CodingJourney #Algorithms #DataStructures #GitHub This is my GitHub Account Link [https://lnkd.in/deEZGbAU]
To view or add a comment, sign in
-
#DailyRTips 30!! Today, I want to share the function list.files() with you! This function allows you to look into a directory and find the files in said directory and returns a vector containing the names of the contents of the directory. In the example, we are looking into a folder called "R". Hey everyone, I am going to pause on #DailyRTips for now, as I learn more about this language and read more to upskill. Might be back some time later in the future, however! :) #Rstats #RPrograming #DataScience
To view or add a comment, sign in
-
-
🚀 Exciting Project Update: Predicting Iris Species with FastAPI! 🌸 I'm thrilled to share my latest project, which combines data science and web development to create a seamless prediction tool for iris species classification. Leveraging the power of FastAPI, I've developed a web application that allows users to input iris flower measurements and receive real-time predictions along with probability scores. 🔍 Project Highlights: - Technology Stack: FastAPI, Uvicorn, Docker, Jinja2, Scikit-Learn, Joblib - Features: - User-Friendly Interface: Clean and intuitive UI for easy data entry. - Real-Time Predictions: Instant classification of iris species based on user input. - Detailed Results: Probability scores for each species to provide a more comprehensive prediction. 📂 Key Components: - main - The heart of the FastAPI app, handling routing and prediction logic. - app - Contains scripts for model inference, training, and evaluation. - Dockerfile & docker-compose.yaml - Ensures smooth deployment and scalability. This project not only demonstrates the practical use of machine learning in web applications but also highlights the versatility and efficiency of FastAPI for creating robust, production-ready services. I'm looking forward to feedback and any suggestions for further improvements. Feel free to connect if you're interested in discussing this project or exploring similar technologies! Github URL - https://lnkd.in/dM2gc3d9 Dockerhub Image - https://lnkd.in/dAH97waR Demo App Link - https://lnkd.in/deyheCSQ #FastAPI #Uvicorn #ScikitLearn #MachineLearning #WebDevelopment #Docker #DataScience #Python #DevOps
To view or add a comment, sign in
-
It's the weekend! Time to work on the open_prompt DuckDB #Community #Extension inspired by the amazing MotherDuck prompt function! > SELECT open_prompt('Write a one-line poem about ducks') as text; Designed to provide fast and uncomplicated access to any local or remote LLM using #Ollama (CPU only or GPU) or any #OpenAI completions compatible API or Service 🧠 This weekend's challenge: Structured JSON output 🚀🚀🚀 EDIT: MISSION ACCOMPLISHED in v0.0.3 of open_prompt! 🚀🚀🚀 #DuckDB #Extension #LLM #LLMs #OpenAI #Completions #Prompt #AI #Quackscience #Opensource #OLAP #Database #DBMS
To view or add a comment, sign in
-
Old blog post, but reinforces a need to share code instead of everyone inventing their own version of the wheel. In response 🚀 Liminal is starting an initiative to open-source as much work as we can, and organize it into a repository of general tools for life scientists. This includes work from time at Purdue University & Liminal 🐼 - making all of our data processors open and accessible, for the good of science. Previously, we created https://lnkd.in/eVgyzT7j, but now we're updating with a new repository. One easier to download, easier to get more information on, and most importantly, easier for the community to contribute to. While paying homage to Ursula K. Le Guin's coined term 'Ansible' would be an ideal name, we're keeping our tool named liminal. Immediate transmission of perfect knowledge is nice, but the world doesn't work that way [yet]. Liminal is a transitional phase, which represents the perpetual state of science; always on the verge of transformation. If you're a scientists or bionformatician with code to contribute or are interested in chatting, feel free to message me. As we develop this repository, we will update our Liminal community (as well as the general community) to ensure everyone has access to our work. https://lnkd.in/ekwcrPWv
To view or add a comment, sign in
-
Both Pandas and Polars are robust data manipulation tools, but their syntaxes differ subtly. Polars tends to use more explicit, verb-based methods, while Pandas leverages more concise bracket notation. The choice between Pandas and Polars often comes down to performance needs, library familiarity, and personal preference. Polars is known for its speed and efficiency. Pandas, on the other hand, has a larger ecosystem and is more widely adopted. 📘 Full code: https://bit.ly/3LA9wqY #DataScience #pandas #FeatureEngineering #DataPreprocessing
To view or add a comment, sign in
-
-
🌳 Exploring Deletion in Binary Search Trees: A Comprehensive Guide In the realm of Binary Search Trees (BSTs), understanding the deletion process is crucial for maintaining data integrity and optimizing tree structure. Let's delve into the intricacies of BST deletion, exploring each case and its implications. 🚀 Understanding Deletion in BSTs: Deletion in BSTs involves various scenarios, each requiring careful handling to preserve the tree's properties. Here's an in-depth look at the deletion process: 🔍 Case 1: Node with No or One Child: 👉 If the node to be deleted has no children or only one child: ➡️ Simply remove the node and adjust the tree accordingly. 🔍 Case 2: Node with Two Children: 👉 If the node to be deleted has two children: ➡️ Find the inorder successor (the smallest node in the right subtree) or the inorder predecessor (the largest node in the left subtree). ➡️ Replace the node to be deleted with the inorder successor/predecessor. ➡️ Delete the inorder successor/predecessor from its original position. 🕒 Time Complexity: 👉 Deletion: O(log n) on average; O(n) in the worst case, where 'n' is the number of nodes in the tree. 💡 Real-world Application: 👉 Deletion in BSTs is commonly used in database management systems, where efficient data modification is essential for maintaining data integrity and system performance. 🔗 GitHub Repository: https://lnkd.in/gnKbjXJX By mastering deletion in BSTs, we unlock the potential for efficient data management and streamlined operations. Let's continue our journey of exploration and learning in the realm of Binary Search Trees! #BinarySearchTree #DataStructures #Algorithms #LearningJourney #management #learning #innovation #algorithms #programming #computerscience #datascience #GauravSah #16day
To view or add a comment, sign in
-
-
🚀 𝐂𝐡𝐨𝐨𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐑𝐢𝐠𝐡𝐭 𝐓𝐨𝐨𝐥 𝐟𝐨𝐫 𝐋𝐚𝐫𝐠𝐞 𝐃𝐚𝐭𝐚𝐬𝐞𝐭𝐬: 𝐏𝐚𝐧𝐝𝐚𝐬, 𝐃𝐚𝐬𝐤, 𝐨𝐫 𝐏𝐲𝐒𝐩𝐚𝐫𝐤? 🚀 Ever wondered which tool to use for handling large datasets on your laptop? 🤔 If you’re working with a personal laptop with limited RAM, here’s a quick guide to help you decide between Pandas, Dask, and PySpark. 🟢 𝐅𝐨𝐫 𝐝𝐚𝐭𝐚𝐬𝐞𝐭𝐬 𝐥𝐞𝐬𝐬 𝐭𝐡𝐚𝐧 1𝐆𝐁: Pandas is perfect—simple and efficient! 🟡 𝐅𝐨𝐫 𝐝𝐚𝐭𝐚𝐬𝐞𝐭𝐬 𝐛𝐞𝐭𝐰𝐞𝐞𝐧 1𝐆𝐁 𝐚𝐧𝐝 100𝐆𝐁: Dask is your friend. It scales well on your laptop, handling larger data with ease. 🔵 𝐅𝐨𝐫 𝐝𝐚𝐭𝐚𝐬𝐞𝐭𝐬 𝐥𝐚𝐫𝐠𝐞𝐫 𝐭𝐡𝐚𝐧 100𝐆𝐁: PySpark is the way to go. It’s built for massive datasets and distributed computing. #تونس_أفضل #DataScience #DataEngineering
To view or add a comment, sign in
-
Just finished a really challenging module about Vector databases 🎉 In this module, I've learnt about - Semantic search and how it more accurate than word search. - Implementing semantic search using Elasticsearch. - Implementing Advanced techniques for more accurate results. - Evaluating the retrieval performance by generating ground truth data. - Discovering evelauation metrics like Hit-Rate and Mean Reciprocal Rank. We have used Elasticsearch to perform both, keyword and semantic search. While semantic search is more accurate, it generally requires lots of computer power to embed the documents you store in the DB and also the query you used in search. Playing with data is what makes you to learn more interesting things. Try more transformers and play with the search query to learn more improve your learning experience 😬 Special thanks to DataTalksClub and Alexey for this valuable resource! Looking forward to diving into the next modules. #Elasticsearch #vectorsearch #llmzoomcamp
To view or add a comment, sign in
-
🚀 I'm thrilled to share this weeks project -> CGAN-MNIST Refactored! This endeavor started as an inspiration from a fantastic Jupyter notebook created by Amir Hossein Fouladi and shared here on LN and GH. My project builds upon Amir's original code, refining and extending it for enhanced usability and functionality. 🔍 What's Inside? - A CGAN training module with a MNIST dataset - A Command Line Interface for ease of use and accessibility. - Model Saving and Loading capabilities to pick up right where you left off. - Customizable Training Sessions with configuration files for each run. - Visual Progress Tracking with images saved per epoch and a snazzy progress bar. - Enhanced Performance with configurable learning rates and diverse optimizers. - And much more, including some snazzy ASCII art for the CLI enthusiasts! 💡 Why This Project? My goal was to provide an example POC style repository that's easily adaptable and user-friendly for fellow developers and researchers. Too realy showcase what's possible. To take the Jypter Notebook and make it a production ready script. 👥 Let's Connect! I'm eager to connect with like-minded individuals passionate about advancing the field of machine learning. Whether you have feedback, questions, or just want to chat about the potential applications of this project, feel free to reach out or follow me here on LinkedIn. 🙏 Acknowledgements A huge shoutout to the open-source community. Your feedback and contributions are what drive projects like this forward. - Amir Hossein Fouladi's original code -> https://lnkd.in/ePywdHiW 🔗 Check out my project here: -> https://lnkd.in/eMeZtYhC #MachineLearning #DataScience #OpenSource #CGAN #MNIST #GitHubProjects #ArtificialIntelligence
To view or add a comment, sign in
nice to see it comes to fruition!!