When you’re dealing with massive amounts of data (hundreds of terabytes or petabytes), then the traditional ways of dealing with data start to break down.
You’ll need to use a distributed system, and you’ll have to orchestrate the different components to suit your workload.
Typically, big data solutions involve one or more of the following types of workloads.
- Batch processing of data in storage
- Real-time processing of data from a stream
- Interactive exploration of data
- Predictive analytics and machine learning