12 MUST KNOW BIG DATA TERMS – TO IMPRESS YOUR DATE (OR WHOEVER)

Big Data

Big Data can be intimidating! This infographic will help you feel at home with Big Data.

1 ALGORITHM

Algorithm is a mathematical formula or statistical process used to perform an analysis of data.

2 BIG DATA ANALYTICS

Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.

3 DATA LAKE

Data lake is a large repository of enterprise-wide data in a raw format.

4 DATA WAREHOUSE

Data warehouse is a repository for enterprise-wide data but in a structured format

5 DATA MINING

Data mining is about finding meaningful patterns and deriving insights in a large sets of data using sophisticated pattern recognition techniques.

6 DATA SCIENTIST

Data Scientist is someone who can make sense of a big data by extracting raw data, massage it, and come up with insights. analytics, Skills needed: Statistics, computer science, creativity, story-telling and understand business context.

7 HADOOP

Hadoop is an open source software framework that consists of a Hadoop Distributed File System (HDFS) and allows for storage, retrieval, and analysis of very large data sets using distributed hardware.

8 IN-MEMORY COMPUTING

In-memory computing is a technique to moving the working datasets entirely within a cluster’s collective memory and avoid writing intermediate calculations to disk thus making it faster. Apache Spark is an example of this

9 MACHINE LEARNING

Machine learning is a method of designing systems that can learn, adjust, and improved based on the data fed to them.

10 MAPREDUCE

Hadoop MapReduce is a software framework for distributed processing of large data sets on compute clusters of commodity hardware.

11 NOSQL (NOT ONLY SQL)

NoSQL, Not Only SQL, refers to database management systems that are designed to handle large volumes of data that does not have a structure of schema.

12 (APACHE) SPARK

Apache Spark is a fast, in-memory data processing engine to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.