Recently we hear a lot about the term big data and the rapid spread of this field in the labor market.
But have we ever wondered what Big Data is? To agree in principle, there is more than one definition for the term Big Data as explained by the International Federation
For telecommunications (ITU) that there is no precise definition of big data. In general, when we talk about big data, we are talking about data of multiple types, sources and sizes.
Definition of:
Before we go into the definition of big data, we must know what the data is?
Data: It is the raw image of the information before sorting, arranging and processing. It cannot be used in its preliminary form before processing.
The raw data can be divided into three types:
Structured data: It is data organized into tables or databases.
Unstructured data: it constitutes the largest percentage of data, which is the data that people generate daily from text writings, photos, videos, messages, and clicks on websites ... etc.
Semi-structured data: It is considered a type of structured data, but the data are not in the form of tables or databases.
Now what is big data?
Experts define big data as any set of data that is larger than the ability to process it using traditional database tools to capture, share, transfer, store, manage and analyze within an acceptable time period for that data; From the point of view of service providers, they are the tools and processes that organizations need to handle a large amount of data for the purpose of analysis. The two parties agreed that they are formidable data that cannot be processed by traditional methods in light of these aforementioned restrictions.
Here are some definitions of global organizations and destinations that describe Big Data:
Gartner Inc, an IT research and consultancy firm, is known as "large-scale, fast-flowing, and diversified information assets that require economically and creatively feasible processing methods to develop insights and decision-making methods."
As known by the company (IBM) "Big data is created by everything around us and at all times every digital process and every social media exchange produces us big data, transmitted by systems, sensors, and mobile devices Big data has multiple sources in speed, size and diversity In order to extract moral benefit from big data, we need ideal processing, analytical capabilities, and skills. "
As for the International Organization for Standardization (ISO), it is defined as "a group or groups of data that have unique characteristics (such as size, speed, diversity, variance, data validity ... etc), that cannot be efficiently treated using current and traditional technology to achieve benefit from it."
Defined by the International Telecommunication Union (ITU), “big data refers to datasets that are super voluminous, fast or diverse, compared to the types of datasets commonly used.”
Because of the time, effort, and high cost that big data needs to analyze and process it, technicians have had to rely on Artificial Intelligence systems that have the ability to learn, infer, and react to situations not programmed in the machine using sophisticated algorithms to work on, as well as using cloud computing technologies to complete their work.
Usually big data means data that is measured in petabyte (thousand terabytes) or exabyte (million terabytes); According to IBM, starting from the year 2012 AD, we create daily approximately 2.5 quintillion bytes of big data (Quintillion is the number one followed by eighteen (18) zero).
For information, the word Big Data was added to the Oxford Dictionary as a new English word.
Big data classification:
Many of us believe that big data is classified according to size only, in fact it is classified according to (3V's) principle and it consists of:
Size:
It is the volume of data extracted from a source, which determines the value and capabilities of the data to be classified as big data. And it may be the most important characteristic of big data analysis. Also, describing it as big does not specify a certain amount. As we mentioned earlier, size is usually measured in bytes or bytes. For information, by 2020, cyberspace will contain approximately 40,000 MB of data ready for analysis and information extraction; It is estimated that 90% of the data in the world today has been created during the last two years, by devices and by humans, both of which contributed to the increase in data.
Variety
It refers to the diversity of data extracted, which helps users, whether they are researchers or analysts, to choose data appropriate for their field of research and includes structured data in databases and unstructured data that comes from its unstructured nature, such as: pictures, clips, sound recordings, videos, SMS and call logs GPS data ... and much worse It takes time and effort to properly configure it for processing and analysis.