본문으로 바로가기

Spark courses

With Spark, data is read into memory, operations are performed, and the results are written back, resulting in faster execution. Learn core principles and common packages on DataCamp.

무료 계정을 만드세요

또는

계속 진행하시면 당사의 이용약관, 개인정보처리방침 및 귀하의 데이터가 미국에 저장되는 것에 동의하시는 것입니다.

2명 이상을 교육하시나요?

DataCamp for Business 체험

Recommended for Spark beginners

Build your Spark skills with interactive courses curated by real-world experts

강의

Foundations of PySpark

중급기술 수준
4 시간
501
Learn to implement distributed data management and machine learning in Spark using the PySpark package.

어디서 시작해야 할지 모르시겠나요?

평가 받기

Spark 강의 및 트랙 둘러보기

강의

PySpark 입문

중급기술 수준
4 시간
5K
PySpark를 마스터하여 빅데이터를 손쉽게 처리하세요—대규모 데이터셋을 처리하고 쿼리하며 최적화하여 강력한 분석을 수행하는 방법을 배우세요!

강의

PySpark로 하는 Machine Learning

고급기술 수준
4 시간
927
Apache Spark로 데이터에서 예측을 수행합니다. 의사결정나무, 로지스틱 회귀, 선형 회귀, 앙상블, 파이프라인을 다룹니다.

강의

Foundations of PySpark

중급기술 수준
4 시간
501
Learn to implement distributed data management and machine learning in Spark using the PySpark package.

강의

Python에서 Spark SQL 입문

고급기술 수준
4 시간
455
Python에서 SQL을 사용하여 Spark에서 데이터를 조작하고 머신러닝 특징 집합을 생성하는 방법을 배워보세요.

강의

PySpark로 하는 Feature Engineering

고급기술 수준
4 시간
446
데이터 과학자가 시간의 70–80%를 쏟는 핵심, 데이터 정제와 피처 엔지니어링의 실무를 깊이 있게 학습하세요.

Spark 관련 리소스

블로그

The Top 20 Spark Interview Questions

Essential Spark interview questions with example answers for job-seekers, data professionals, and hiring managers.
Tim Lu's photo

Tim Lu

블로그

Flink vs. Spark: A Comprehensive Comparison

Comparing Flink vs. Spark, two open-source frameworks at the forefront of batch and stream processing.
Maria Eugenia Inzaugarat's photo

Maria Eugenia Inzaugarat

8분

튜토리얼

Pyspark Tutorial: Getting Started with Pyspark

Discover what Pyspark is and how it can be used while giving examples.
Natassha Selvaraj's photo

Natassha Selvaraj

10분


Ready to apply your skills?

Projects allow you to apply your knowledge to a wide range of datasets to solve real-world problems in your browser

Frequently asked questions

Which Spark course is the best for absolute beginners?

For new learners, DataCamp has three introductory Spark courses across the most popular programming languages:

Introduction to PySpark 

Introduction to Spark with sparklyr in R 

Introduction to Spark SQL in Python Course

Do I need any prior experience to take a Spark course?

You’ll need to have completed an introduction course to the programming language you’re using Spark on. 

All of which you can find here:

Introduction to Python

Introduction to R

Introduction to SQL

Beyond that, anyone can get started with Spark through simple, interactive exercises on DataCamp.

What is PySpark used for?

If you're already familiar with Python and libraries such as Pandas, then PySpark is a good language to learn to create more scalable analyses and pipelines.

Apache Spark is basically a computational engine that works with huge sets of data by processing them in parallel and batch systems. 

Spark is written in Scala, and PySpark was released to support the collaboration of Spark and Python.

How can Spark help my career?

You’ll gain the ability to analyze data and train machine learning models on large-scale datasets—a valuable skill for becoming a data scientist. 

Having the expertise to work with big data frameworks like Apache Spark will set you apart.

What is Apache Spark?

Apache Spark is an open-source, distributed processing system used for big data workloads. 

It utilizes in-memory caching, and optimized query execution for fast analytic queries against data of any size. 

It provides development APIs in Java, Scala, Python, and R, and supports code reuse across multiple workloads—batch processing, interactive queries, real-time analytics, machine learning, and graph processing.

기타 기술 및 주제

기술