Skip to content

rantibi/data_engineer_exercise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XRef Report pipe

Part I - Query Executor

You should implement a Java application that:

Gets the following as an input:

  • Query with parameters (for example: :from)
  • Parameters and their values (for example: -Dfrom='2019-01-05')
  • Output format (for example: CSV)
  • Output file path

Executes it in a provided data source and writes the result to the output file path in the provided output format.

  1. You should implement only PostgresDB support as the data source, but make sure that it will easily support many different data sources.
  2. You should implement only CSV (with header) as the output format, but make sure that it will easily support many different formats (for example json).
  3. Keep in your mind that you may want to combine between different data sources and different output formats.
  4. You are welcome to use any external library (for example JDBC/JDBI).
  5. The application should be ready to use in production.

For testing your code you can use the followingpublic database, and can execute your code using the following query:

select * from xref where timestamp > :from;

Part II - XRef Report

Using Airflow, you should generate an hourly CSV with new records in the xref table and send the file as an email.

High level steps

  1. (Optional) Fork this repository, clone it and use it to implement the next steps (it should help you)
  2. Install Airflow docker
  3. Implement Airflow dag xref_pipe_dag.py with 2 tasks and their dependencies:
  • query_executor - Generate an hourly CSV with new records in the xref table
  • xref_report - Send the file that was generated by query_executor and send it by email (to a fake email address) (use the following if you want)
  1. Share your solution using github (data_engineer_exercise + Java code)

Installation instruction (Optional)

In order to not make this task too complex, we provided a few steps that should help you to prepare the airflow environment

  1. Fork this repository and clone it
  2. Install Docker
  3. Install airflow docker with LocalExecutor and Java:
cd <PATH_TO_DATA_ENGINEER_EXERCISE_FOLDER>
docker-compose -f ./airflow/docker-compose-LocalExecutor.yml up -d

In order to install java we extended puckel/docker-airflow:1.10.2 image, the original docker image can be found here if you want to read about it (you don't really need to, just in case you have some problems it might help)

  1. Go to (http://localhost:8080) and make sure that example_dag.py and its task print "Hello World!"
  2. Now you can implement the tasks and create the new dag

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors