While you probably don't have to worry about aliens invading on the fourth of July, you should be aware that text embeddings may expose your private information.
Fortunately it's an addressable risk. Our Senior AI Scientist, Joseph Ferrara, PhD, covers how to mitigate the problem on a pdf.
Broadly, the steps are:
-Extract the text
-De-identify the sensitive info
-Chunk the resulting text
-Embed it in a Pinecone database
-Perform some test queries
If you want to see more detail and sample code, check out the how to guide the comments.
#RAG#AI#embeddings
Reinforcement Learning Team Leader & BO Tech Expert @ Huawei Research London - Advisor @ Sanome - Honorary Assistant Professor at UCL. Ex-@Princeton, Ex-@Upenn. All opinions are my own.
As you know, I am trying to cover the MCTS LLM literature and found 86 papers between 2023 and 2024. I don't how to grasp that knowledge tbf.
So, I thought of building what I call paper cards. It seemed to help me summarise the paper, and I also added points that were not clear and that I wanted to go back to.
In any case, I thought of sharing one with you; maybe you'd find it also helpful. If so, I will continue sharing those as I read through those papers.
#AIart#AI#MachineLearning
As a follow up to Haitham’s review of MCTS papers…
I want to remind everyone that MCTS is a kind of policy in my class of direct lookahead approximations (DLAs), for solving a sequential decision problems. DLAs are just one of four classes of policies.
MCTS is one example of a stochastic DLA, which means it is solving a stochastic optimization problem (said differently, an approximate sequential decision problem that I call the lookahead model) to make decisions to be implemented in a sequential decision problems called the base model.
See chapter 19 of
https://lnkd.in/dB99tHtM
For a complete discussion of DLA policies, where I offer notation that distinguishes between base models and lookahead models. Since the lookahead model is another (usually simplified) sequential decision problem, it means we have to choose a policy for making decisions in the lookahead model. I sometimes call this the “lookahead policy” or the “policy-within-a-policy”.
Reinforcement Learning Team Leader & BO Tech Expert @ Huawei Research London - Advisor @ Sanome - Honorary Assistant Professor at UCL. Ex-@Princeton, Ex-@Upenn. All opinions are my own.
As you know, I am trying to cover the MCTS LLM literature and found 86 papers between 2023 and 2024. I don't how to grasp that knowledge tbf.
So, I thought of building what I call paper cards. It seemed to help me summarise the paper, and I also added points that were not clear and that I wanted to go back to.
In any case, I thought of sharing one with you; maybe you'd find it also helpful. If so, I will continue sharing those as I read through those papers.
#AIart#AI#MachineLearning
Those who use GLRT classifiiers in their work, you know that GLRT, specifically with nested hypotheses testing suffers from always choosing the model with more parameters (more compelx model if you will). This is a disadvantage in cases of model order selection, and anomaly detection.
Algorithms that "correct" this have been employed by many of us, namely, BIC, MDL, AIC etc.
In the paper below, Steven Kay proposes a transformation that "levels the playing field" for GLRT applications. The authors prove that under certain conditions, that like the probability integral transformation that asserts r.v.'s passed as an input to their CDF will turn out Uniform r.v.s, there is a similar transform that will yield N(0,1) r.v.s asymptotically.
This transformation is built up on the conjugate function of the Cumulant Generating Function (CGF) of the arbitrary, i.i.d, to be transformed r.v.s. This transform is also referred to as the Legendre Transform (LT) in the paper.
read the full paper summary in my website: https://lnkd.in/g8RB2pQf#Statistics#GLRT#Hypothesestesting#classification#anomalydetection#ML#AI#datascience
The University of Pennsylvania's CSSLab has developed an AI-powered Media Bias Detector to reveal subtle biases in news coverage. By classifying articles by topic and analyzing tone and political leanings, it provides a detailed visual representation of how various news outlets report differently on the same issues. This tool helps users critically evaluate news sources and understand media bias. #MediaBias#AI#NewsAnalysis#AI4Journalismhttps://lnkd.in/gMjPpAdu
Building AI Solutions | AI and Machine Learning | Data Science | Management Consulting | Strategy & Operations | Digital Transformation | Project Management |
Have you ever wondered how AI can seem so intelligent? The secret lies in Bayes' theorem, a fundamental concept in probability theory.
Bayes’ theorem explains probabilities of something to happen based on the evidence we have: P(A|B) = P(A) * P(B|A) / P(B)
Where A is 'something' and B is 'evidence'.
Think of Bayes' theorem as a detective, using evidence to update its beliefs. For example, an image classifier "detects" whether a photo is of a cat or a dog by comparing it to past examples. Similarly, advanced AI like GPT-4 and Midjourney predict what humans might create, based on their training data.
But Bayes' theorem isn't just for AI. Our brains use it too! That's why optical illusions trick us and why psychedelics can create mind-bending experiences. It even explains why people can have vastly different interpretations of the same evidence.
So, next time you encounter AI or even everyday life, remember Bayes' theorem. It's the hidden force shaping everything from technology to human perception.
#AI#BayesTheorem#MachineLearning#ArtificialIntelligence
**In the picture**: A Jakarta Governor candidate claimed during a debate that the pandemic was a hidden foreign agenda and that AI was a spy tool (from the word 'intelligence').
The Future of RAG: More Questions, More Complexity
In the realm of Retrieval-Augmented Generation (RAG), the journey is far from straightforward. As we shift from simple keyword searches to sophisticated question-answering systems, the challenges multiply.
The retrieval component remains a significant hurdle. There's no 'one-size-fits-all' solution, and as user expectations evolve, so does the complexity of the systems we develop.
Data suggests that as we improve our models, the intricacies of retrieval will only deepen. This highlights the necessity for ongoing innovation and research in the field of AI.
Join the conversation on how we can tackle these challenges head-on! What are your thoughts on the future of AI in RAG? Let's discuss it!
#AI#MachineLearning#GenerativeAI#RetrievalAugmentedGeneration#DataScience#TechInnovation#FutureOfWork
Techy, Tacky & Witty! The one & only Stephen Fry!
Taking a cue from Stephen Fry, I also hereby choose to render Artificial Intelligence as as Ai instead of AI from now on in order to make life easier for people called Alok, Alpesh, Alisha, Alvira etc et al. I cannot imagine the Blinkit founder being too pleased when he reads that Al is an existential threat to all Kirana stores!
#Ai#stephenfryhttps://lnkd.in/g9MYVm7N
$221K vs $715 is meaningful in a comparison of Yurts and Anthropic.
Enterprises and the DOD need to move beyond basic open-source tools and proprietary API vendors that are charging based on tokens.
Blog Post: https://lnkd.in/d__xrtJA
> Retrieval-augmented generation (RAG) systems use many different algorithms to chunk, embed, and rank textual content for enabling natural language based question-answering on private knowledge bases. In a recent article, Anthropic introduced Contextual Retrieval, a new chunking algorithm that surpasses state-of-the-art (SOTA) methods and released the “Codebases dataset” for its benchmarking. In this article, we evaluate Yurts’ RAG pipeline on this new benchmarking dataset. This evaluation highlights that Yurts’ RAG system matches the performance of Contextual Retrieval, while operating at only 1/300th of the cost, underscoring our commitment to providing high performance solutions that are also cost-effective for our customers. Through our analysis, we also highlight some key challenges that arise when using new benchmarking datasets to evaluate RAG platforms, offering insights for those navigating this evolving space.
Frank Kane brought up some interesting views on the “AI War” between the US and China. I always think of this confrontation as a more friendly Cold War. However, there are a few issues:
* The warmest part of the war is about hardware (design, manufacturing, etc.) Both the US and China playing the restrictions game is not good for either
* The war for talent is probably on the side of the US - we still attract the best and brightest, and certainly produce the more innovative and leading brands, and capitalism certainly will drive what works. Hopefully, we sort out the frictions around Student Visa and Immigration. This is a relatively simple fix of the politicians want to. Enough said about that
* While numbers (China wins) is not the same as Quality (US is probably leading, but China catching up quickly), we should pay attention to both
* Taiwan, in my opinion, and the noise around it on both sides is a bogeyman, and in a decade or so by which time hopefully both sides catch up at least to some degree, Taiwan may become a non-issue. Also, hopefully, it will reduce or even eliminate the Sabre rattling on both sides. Good for all!
* I believe the US still lags China in terms of the amount of regulation and safety around AI. While the all powerful Government can push its way through in China, we are just trying to understand this
If you have a different opinion, let me know in the comments.
Note: Keep it civil please!
I'm busy doing some research and writing some examples to update my AI & Machine Learning course. Advanced RAG techniques, ways to measure RAG, LLM Agents, Swarms of Agents, etc.
What keeps surprising me is how many influential AI papers and new systems come from China lately. Remember Devin? ChatDev did it first, in China. Prompt compression techniques for RAG? China. Semantic chunking? China again.
I'm not a fan of any sort of tribalism, so I don't see this as good or bad. But it's interesting.
https://www.tonic.ai/blog/how-to-create-de-identified-embeddings-with-tonic-textual-pinecone