SlideShare a Scribd company logo
Feature Preview: Custom Pregel
Complex Graph Algorithms made Easy
@arangodb @joerg_schad @hkernbach
2
tl;dr
● “Many practical computing problems concern large
graphs.”
● ArangoDB is a “Beyond Graph Database”
supporting multiple data models around a scalable
graph foundation
● Pregel is a framework for distributed graph
processing
○ ArangoDB supports predefined Prgel algorithms, e.g.
PageRank, Single-Source Shortest Path and Connected
components.
● Programmable Pregel Algorithms (PPA) allows
adding/modifying algorithms on the flight
Disclaimer
This is an experimental
feature and especially the
language specification
(front-end) is still under
development!
Jörg Schad, PhD
Head of Engineering and ML
@ArangoDB
● Suki.ai
● Mesosphere
● Architect @SAP Hana
● PhD Distributed DB
Systems
● Twitter: @joerg_schad
4
Heiko Kernbach
Core Engineer (Graphs Team)
@
● Graph
● Custom Pregel
● Geo / UI
● Twitter: @hkernbach
● Slack:
hkernbach.ArangoDB
5
● Open Source
● Beyond Graph Database
○ Stores, K/V, Documents connected by
scalable Graph Processing
● Scalable
○ Distributed Graphs
● AQL - SQL-like multi-model query language
● ACID Transactions including Multi Collection
Transactions
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
Pregel Max Value
While not converged:
Communicate: send own value to neighbours
Compute: Own value = Max Value from all messages (+ own value) Superstep
ArangoDB and Pregel: Status Quo
● https://www.arangodb.com/docs/stable/graphs-pregel.html
● https://www.arangodb.com/pregel-community-detection/
Available Algorithms
● Page Rank
● Seeded PageRank
● Single-Source Shortest Path
● Connected Components
○ Component
○ WeaklyConnected
○ StronglyConnected
● Hyperlink-Induced Topic Search
(HITS)Permalink
● Vertex Centrality
● Effective Closeness
● LineRank
● Label Propagation
● Speaker-Listener Label Propagation 8
var pregel = require("@arangodb/pregel");
pregel.start("pagerank", "graphname", {maxGSS: 100,
threshold: 0.00000001, resultField: "rank"})
● Pregel support since 2014
● Predefined algorithms
○ Could be extended via C++
● Same platform used for PPA
Challenges
Add and modify Algorithms
Programmable Pregel Algorithms (PPA)
const pregel = require("@arangodb/pregel");
let pregelID = pregel.start("air", graphName, "<custom-algorithm>");
var status = pregel.status(pregelID);
● Add/Modify algorithms on-the-fly
○ Without C++ code
○ Without restarting the Database
● Efficiency (as Pregel) depends on Sharding
○ Smart Graphs
○ Required: Collocation of vertices and edges
9
Custom Algorithm
10
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
Accumulators
Accumulators are used to consume and process messages which are being
sent to them during the computational phase (initProgram, updateProgram,
onPreStep, onPostStep) of a superstep. After a superstep is done, all messages
will be processed.
● max: stores the maximum of all messages received.
● min: stores the minimum of all messages received.
● sum: sums up all messages received.
● and: computes and on all messages received.
● or: computes or and all messages received.
● store: holds the last received value (non-deterministic).
● list: stores all received values in list (order is non-deterministic).
● custom
Custom Algorithm
11
{
"resultField": "<string>",
"maxGSS": "<number>",
"dataAccess": {
"writeVertex": "<program>",
"readVertex": "<array>",
"readEdge": "<array>"
},
"vertexAccumulators": "<object>",
"globalAccumulators": "<object>",
"customAccumulators": "<object>",
"phases": "<array>"
}
● resultField (string, optional): Name of the document attribute to store the result in. The
vertex computation results will be in all vertices pointing to the given attribute.
● maxGSS (number, required): The max amount of global supersteps After the amount of max
defined supersteps is reached, the Pregel execution will stop.
● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge.
○ writeVertex: A program that is used to write the results into vertices. If writeVertex is
used, the resultField will be ignored.
○ readVertex: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single attribute at the top level.
■ array of strings: Represents a nested path
○ readEdge: An array that consists of strings and/or additional arrays (that represents
a path).
■ string: Represents a single path at the top level which is not nested.
■ array of strings: Represents a nested path
● vertexAccumulators (object, optional): Definition of all used vertex accumulators.
● globalAccumulators (object, optional): Definition all used global accumulators. Global
Accumulators are able to access variables at shared global level.
● customAccumulators (object, optional): Definition of all used custom accumulators.
● phases (array): Array of a single or multiple phase definitions.
● debug (optional): See Debugging.
Phases - Execution order
12
Step 1: Initialization
1. onPreStep (Conductor, executed on Coordinator
instances)
2. initProgram (Worker, executed on DB-Server instances)
3. onPostStep (Conductor)
Step {2, ...n} Computation
1. onPreStep (Conductor)
2. updateProgram (Worker)
3. onPostStep (Conductor)
Program - Arango Intermediate Representation (AIR)
13
Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation, represented in
JSON and supports its data types
14
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
Program - Arango Intermediate Representation (AIR)
Lisp-like intermediate representation,
represented in JSON and supports its data types
15
Specification
● Language Primitives
○ Basic Algebraic Operators
○ Logical operators
○ Comparison operators
○ Lists
○ Sort
○ Dicts
○ Lambdas
○ Reduce
○ Utilities
○ Functional
○ Variables
○ Debug operators
● Math Library
● Special Form
○ let statement
○ seq statement
○ if statement
○ match statement
○ for-each statement
○ quote and quote-splice
statements
○ quasi-quote, unquote and
unquote-splice statements
○ cons statement
○ and and or statements
Pregelator
Simple Foxx service based IDE
16https://github.com/arangodb-foxx/pregelator
Custom Pregel Algorithms in ArangoDB
PPA: What is next?
- Gather Feedback
- In particular use-cases
- Missing functions & functionality
- User-friendly Front-End language
- Improve Scale/Performance of underlying
Pregel platform
- Algorithm library
- Blog Post (including Jupyter example)
18
ArangoDB 3.8 (end of year)
- Experimental Feature
- Initial Library
ArangoDB 3.9 (Q1 21)
- Draft for Front-End
- Extended Library
- Platform Improvements
ArangoDB 4.0 (Mid 21)
- GA
Pregel vs AQL
When to (not) use Pregel…
- Can the algorithm be efficiently be
expressed in Pregel?
- Counter example: Topological Sort
- Is the graph size worth the loading?
19
AQL Pregel
All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing
Online Queries Large Graphs, multiple iterations
How can I start?
● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3
● Check existing algorithms
● Preview documentation
● Give Feedback
○ https://slack.arangodb.com/ -> custom-pregel
20
Thanks for listening!
21
Reach out with Feedback/Questions!
• @arangodb
• https://www.arangodb.com/
• docker pull arangodb
Test-drive Oasis
14-days for free

More Related Content

What's hot (20)

PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
PPTX
Flutter and ML Kit For Firebase
Hamidou Bah
 
PDF
私とOSSの25年
MITSUNARI Shigeo
 
PDF
cLoki: Like Loki but for ClickHouse
Altinity Ltd
 
PDF
GraalVm and Quarkus
Sascha Rodekamp
 
PDF
Prometheus and Thanos
CloudOps2005
 
PPTX
Jenkins
Lhouceine OUHAMZA
 
PDF
OpenStack Ironic - Bare Metal-as-a-Service
Ramon Acedo Rodriguez
 
PPTX
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
Xionglong Jin
 
PPTX
Workshop Spring - Session 4 - Spring Batch
Antoine Rey
 
PDF
Iocp advanced
Nam Hyeonuk
 
PPTX
jcmd をさわってみよう
Tsunenaga Hanyuda
 
PDF
[오픈소스컨설팅]J boss6 7_교육자료
Ji-Woong Choi
 
PDF
[수정본] 우아한 객체지향
Young-Ho Cho
 
PDF
マルチコアのプログラミング技法 -- OpenCLとWebCL
maruyama097
 
PPTX
Vault - Secret and Key Management
Anthony Ikeda
 
PDF
Introduction à l’intégration continue avec Jenkins
Eric Hogue
 
PDF
Thanos: Global, durable Prometheus monitoring
Bartłomiej Płotka
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Java for XPages Development
Teamstudio
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Flutter and ML Kit For Firebase
Hamidou Bah
 
私とOSSの25年
MITSUNARI Shigeo
 
cLoki: Like Loki but for ClickHouse
Altinity Ltd
 
GraalVm and Quarkus
Sascha Rodekamp
 
Prometheus and Thanos
CloudOps2005
 
OpenStack Ironic - Bare Metal-as-a-Service
Ramon Acedo Rodriguez
 
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
Xionglong Jin
 
Workshop Spring - Session 4 - Spring Batch
Antoine Rey
 
Iocp advanced
Nam Hyeonuk
 
jcmd をさわってみよう
Tsunenaga Hanyuda
 
[오픈소스컨설팅]J boss6 7_교육자료
Ji-Woong Choi
 
[수정본] 우아한 객체지향
Young-Ho Cho
 
マルチコアのプログラミング技法 -- OpenCLとWebCL
maruyama097
 
Vault - Secret and Key Management
Anthony Ikeda
 
Introduction à l’intégration continue avec Jenkins
Eric Hogue
 
Thanos: Global, durable Prometheus monitoring
Bartłomiej Płotka
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Java for XPages Development
Teamstudio
 

Similar to Custom Pregel Algorithms in ArangoDB (20)

PDF
Design and Implementation of the Security Graph Language
Asankhaya Sharma
 
PPTX
Dart the Better JavaScript
Jorg Janke
 
PDF
Building Your First Apache Apex Application
Apache Apex
 
PDF
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
PPTX
GraphQL & DGraph with Go
James Tan
 
PPTX
Hadoop and HBase experiences in perf log project
Mao Geng
 
PPTX
Oracle to Postgres Schema Migration Hustle
EDB
 
PDF
Big Data processing with Apache Spark
Lucian Neghina
 
PDF
Java 8
vilniusjug
 
PDF
Dart the better Javascript 2015
Jorg Janke
 
PDF
BUD17-302: LLVM Internals #2
Linaro
 
PPTX
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
PPTX
CS267_Graph_Lab
JaideepKatkar
 
PPTX
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
PPTX
Apache Hive for modern DBAs
Luis Marques
 
PDF
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
PDF
Design for Scalability in ADAM
fnothaft
 
PDF
Apache spark - Spark's distributed programming model
Martin Zapletal
 
PPTX
Java High Level Stream API
Apache Apex
 
PPTX
Tech Talk - Overview of Dash framework for building dashboards
Appsilon Data Science
 
Design and Implementation of the Security Graph Language
Asankhaya Sharma
 
Dart the Better JavaScript
Jorg Janke
 
Building Your First Apache Apex Application
Apache Apex
 
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
GraphQL & DGraph with Go
James Tan
 
Hadoop and HBase experiences in perf log project
Mao Geng
 
Oracle to Postgres Schema Migration Hustle
EDB
 
Big Data processing with Apache Spark
Lucian Neghina
 
Java 8
vilniusjug
 
Dart the better Javascript 2015
Jorg Janke
 
BUD17-302: LLVM Internals #2
Linaro
 
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
CS267_Graph_Lab
JaideepKatkar
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
Apache Hive for modern DBAs
Luis Marques
 
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
Design for Scalability in ADAM
fnothaft
 
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Java High Level Stream API
Apache Apex
 
Tech Talk - Overview of Dash framework for building dashboards
Appsilon Data Science
 
Ad

More from ArangoDB Database (20)

PPTX
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
PPTX
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
PPTX
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
PPTX
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
PDF
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
PDF
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
PDF
Graph Analytics with ArangoDB
ArangoDB Database
 
PDF
Getting Started with ArangoDB Oasis
ArangoDB Database
 
PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
PDF
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
PDF
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
PDF
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
PDF
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
PDF
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
PDF
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
PDF
3.5 webinar
ArangoDB Database
 
PDF
Webinar: How native multi model works in ArangoDB
ArangoDB Database
 
PDF
An introduction to multi-model databases
ArangoDB Database
 
PDF
Running complex data queries in a distributed system
ArangoDB Database
 
PDF
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
Graph Analytics with ArangoDB
ArangoDB Database
 
Getting Started with ArangoDB Oasis
ArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
3.5 webinar
ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
ArangoDB Database
 
An introduction to multi-model databases
ArangoDB Database
 
Running complex data queries in a distributed system
ArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
Ad

Recently uploaded (20)

DOCX
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
PPTX
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
DOCX
ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMEN...
JoemarAgbayani1
 
PPTX
microservices-with-container-apps-dapr.pptx
vjay22
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
PDF
Kafka Use Cases Real-World Applications
Accentfuture
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
PDF
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
Presentation abdominal distension (1).pptx
ChZiaullah
 
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
Krezentios memories in college data.pptx
notknown9
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMEN...
JoemarAgbayani1
 
microservices-with-container-apps-dapr.pptx
vjay22
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Kafka Use Cases Real-World Applications
Accentfuture
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
Predicting Titanic Survival Presentation
praxyfarhana
 
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Presentation abdominal distension (1).pptx
ChZiaullah
 

Custom Pregel Algorithms in ArangoDB

  • 1. Feature Preview: Custom Pregel Complex Graph Algorithms made Easy @arangodb @joerg_schad @hkernbach
  • 2. 2 tl;dr ● “Many practical computing problems concern large graphs.” ● ArangoDB is a “Beyond Graph Database” supporting multiple data models around a scalable graph foundation ● Pregel is a framework for distributed graph processing ○ ArangoDB supports predefined Prgel algorithms, e.g. PageRank, Single-Source Shortest Path and Connected components. ● Programmable Pregel Algorithms (PPA) allows adding/modifying algorithms on the flight Disclaimer This is an experimental feature and especially the language specification (front-end) is still under development!
  • 3. Jörg Schad, PhD Head of Engineering and ML @ArangoDB ● Suki.ai ● Mesosphere ● Architect @SAP Hana ● PhD Distributed DB Systems ● Twitter: @joerg_schad
  • 4. 4 Heiko Kernbach Core Engineer (Graphs Team) @ ● Graph ● Custom Pregel ● Geo / UI ● Twitter: @hkernbach ● Slack: hkernbach.ArangoDB
  • 5. 5 ● Open Source ● Beyond Graph Database ○ Stores, K/V, Documents connected by scalable Graph Processing ● Scalable ○ Distributed Graphs ● AQL - SQL-like multi-model query language ● ACID Transactions including Multi Collection Transactions
  • 7. https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/ Pregel Max Value While not converged: Communicate: send own value to neighbours Compute: Own value = Max Value from all messages (+ own value) Superstep
  • 8. ArangoDB and Pregel: Status Quo ● https://www.arangodb.com/docs/stable/graphs-pregel.html ● https://www.arangodb.com/pregel-community-detection/ Available Algorithms ● Page Rank ● Seeded PageRank ● Single-Source Shortest Path ● Connected Components ○ Component ○ WeaklyConnected ○ StronglyConnected ● Hyperlink-Induced Topic Search (HITS)Permalink ● Vertex Centrality ● Effective Closeness ● LineRank ● Label Propagation ● Speaker-Listener Label Propagation 8 var pregel = require("@arangodb/pregel"); pregel.start("pagerank", "graphname", {maxGSS: 100, threshold: 0.00000001, resultField: "rank"}) ● Pregel support since 2014 ● Predefined algorithms ○ Could be extended via C++ ● Same platform used for PPA Challenges Add and modify Algorithms
  • 9. Programmable Pregel Algorithms (PPA) const pregel = require("@arangodb/pregel"); let pregelID = pregel.start("air", graphName, "<custom-algorithm>"); var status = pregel.status(pregelID); ● Add/Modify algorithms on-the-fly ○ Without C++ code ○ Without restarting the Database ● Efficiency (as Pregel) depends on Sharding ○ Smart Graphs ○ Required: Collocation of vertices and edges 9
  • 10. Custom Algorithm 10 { "resultField": "<string>", "maxGSS": "<number>", "dataAccess": { "writeVertex": "<program>", "readVertex": "<array>", "readEdge": "<array>" }, "vertexAccumulators": "<object>", "globalAccumulators": "<object>", "customAccumulators": "<object>", "phases": "<array>" } Accumulators Accumulators are used to consume and process messages which are being sent to them during the computational phase (initProgram, updateProgram, onPreStep, onPostStep) of a superstep. After a superstep is done, all messages will be processed. ● max: stores the maximum of all messages received. ● min: stores the minimum of all messages received. ● sum: sums up all messages received. ● and: computes and on all messages received. ● or: computes or and all messages received. ● store: holds the last received value (non-deterministic). ● list: stores all received values in list (order is non-deterministic). ● custom
  • 11. Custom Algorithm 11 { "resultField": "<string>", "maxGSS": "<number>", "dataAccess": { "writeVertex": "<program>", "readVertex": "<array>", "readEdge": "<array>" }, "vertexAccumulators": "<object>", "globalAccumulators": "<object>", "customAccumulators": "<object>", "phases": "<array>" } ● resultField (string, optional): Name of the document attribute to store the result in. The vertex computation results will be in all vertices pointing to the given attribute. ● maxGSS (number, required): The max amount of global supersteps After the amount of max defined supersteps is reached, the Pregel execution will stop. ● dataAccess (object, optional): Allows to define writeVertex, readVertex and readEdge. ○ writeVertex: A program that is used to write the results into vertices. If writeVertex is used, the resultField will be ignored. ○ readVertex: An array that consists of strings and/or additional arrays (that represents a path). ■ string: Represents a single attribute at the top level. ■ array of strings: Represents a nested path ○ readEdge: An array that consists of strings and/or additional arrays (that represents a path). ■ string: Represents a single path at the top level which is not nested. ■ array of strings: Represents a nested path ● vertexAccumulators (object, optional): Definition of all used vertex accumulators. ● globalAccumulators (object, optional): Definition all used global accumulators. Global Accumulators are able to access variables at shared global level. ● customAccumulators (object, optional): Definition of all used custom accumulators. ● phases (array): Array of a single or multiple phase definitions. ● debug (optional): See Debugging.
  • 12. Phases - Execution order 12 Step 1: Initialization 1. onPreStep (Conductor, executed on Coordinator instances) 2. initProgram (Worker, executed on DB-Server instances) 3. onPostStep (Conductor) Step {2, ...n} Computation 1. onPreStep (Conductor) 2. updateProgram (Worker) 3. onPostStep (Conductor)
  • 13. Program - Arango Intermediate Representation (AIR) 13
  • 14. Program - Arango Intermediate Representation (AIR) Lisp-like intermediate representation, represented in JSON and supports its data types 14 Specification ● Language Primitives ○ Basic Algebraic Operators ○ Logical operators ○ Comparison operators ○ Lists ○ Sort ○ Dicts ○ Lambdas ○ Reduce ○ Utilities ○ Functional ○ Variables ○ Debug operators ● Math Library ● Special Form ○ let statement ○ seq statement ○ if statement ○ match statement ○ for-each statement ○ quote and quote-splice statements ○ quasi-quote, unquote and unquote-splice statements ○ cons statement ○ and and or statements
  • 15. Program - Arango Intermediate Representation (AIR) Lisp-like intermediate representation, represented in JSON and supports its data types 15 Specification ● Language Primitives ○ Basic Algebraic Operators ○ Logical operators ○ Comparison operators ○ Lists ○ Sort ○ Dicts ○ Lambdas ○ Reduce ○ Utilities ○ Functional ○ Variables ○ Debug operators ● Math Library ● Special Form ○ let statement ○ seq statement ○ if statement ○ match statement ○ for-each statement ○ quote and quote-splice statements ○ quasi-quote, unquote and unquote-splice statements ○ cons statement ○ and and or statements
  • 16. Pregelator Simple Foxx service based IDE 16https://github.com/arangodb-foxx/pregelator
  • 18. PPA: What is next? - Gather Feedback - In particular use-cases - Missing functions & functionality - User-friendly Front-End language - Improve Scale/Performance of underlying Pregel platform - Algorithm library - Blog Post (including Jupyter example) 18 ArangoDB 3.8 (end of year) - Experimental Feature - Initial Library ArangoDB 3.9 (Q1 21) - Draft for Front-End - Extended Library - Platform Improvements ArangoDB 4.0 (Mid 21) - GA
  • 19. Pregel vs AQL When to (not) use Pregel… - Can the algorithm be efficiently be expressed in Pregel? - Counter example: Topological Sort - Is the graph size worth the loading? 19 AQL Pregel All Models (Graph, Document, Key-Value, Search, …) Iterative Graph Processing Online Queries Large Graphs, multiple iterations
  • 20. How can I start? ● Docker Image: arangodb/enterprise-preview:3.8.0-milestone.3 ● Check existing algorithms ● Preview documentation ● Give Feedback ○ https://slack.arangodb.com/ -> custom-pregel 20
  • 21. Thanks for listening! 21 Reach out with Feedback/Questions! • @arangodb • https://www.arangodb.com/ • docker pull arangodb Test-drive Oasis 14-days for free