Pig Latin Basics | Hadoop | Big Data
691K views
Oct 24, 2024
Pig Latin Basics
View Video Transcript
0:00
In this video we are discussing Pig Latin basics
0:04
So, let us go for more discussion on it. So how to write code in Pig Latin
0:11
So we have made one list of different rules, do's and don'ts so that you can feel
0:16
better to writing codes in Pig Latin. The first one is that we will use the Grand Sell to write Pig Latin codes
0:24
In the Grand Shell, we need to follow some rules to write Pig Latin codes
0:29
So, we are supposed to go through some rules right now. So, Pig Latin syntax will work with relations and they include expressions and also schema
0:38
So they can use different kinds of expressions and schema in our Pig Latin syntax
0:44
So we can use different operators of Pig Latin to complete our task and this list of operators
0:50
and operations will be discussing next. And except for the load and store operation, the Pig Latin takes relation as input and returns
0:58
another relation as output. So only in case of load and store these two command executions
1:05
it will take the relation one as input and it produces or it outputs relations as output So until the dump operator is not used any map reduced task will be executed So I require the dump operator to get executed
1:22
Next one is that, uh, we should use the semicolon after each line of code in pig
1:28
So this semicolon giving is mandatory at the each and every line of the pig statements
1:36
Now, data types in pig Latin. So what are the different data types and the data types and the
1:39
respective sizes we'll be discussing here. So given below the table describes some of the
1:44
PGL-Latin data types. So here we're having this I-N-T and it is signed 32-bit integer
1:51
We're having long, signed 64-bit integer. We're having float signed 32-bit floating point
1:58
number. The double, that is a signed 64-bit floating point number. That means it is occupying
2:04
4 bytes, it is occupying 8 bytes in the memory, car array, character array, or string
2:10
we're having the date time represents date and time where having bullion, bullion value will
2:16
be true or false as usual. We're having the byte array, that is a byte array
2:21
So in this way we are having multiple different data types are there and the respective descriptions we have provided Complex data types in pig Latin So the given below the table describes some of the complex peak Latin data types
2:36
The first one is a tuple. A tapple, that means a ordered set of fields
2:41
So, fields will be containing values. Ordered set of fields will be known one as a tapple
2:46
A bag means a collection of tuples. We're having this map. Map means a set of key value pairs
2:55
Next, relational operations in Pig Latin. So, we were discussing that what are the relational operations
3:02
So these are the relational operations are there. So the following table describes the relational operators on Pig Latin
3:09
The first one is the load. So load data from HDFS or local disk
3:15
We're having the stored save data to HDFS or onto the local disk
3:21
We're having the filter, remove unwanted rows. distinct remove duplicate roles we're having 4 each comma generate in that case to generate
3:31
data transformations based on the columns of data so that for that we'll be using for
3:37
each and generate we having the stream transforms relations with external program so that is a stream We having the dump print data for a relation So dump is one of the respective operators
3:53
So now let us go for relational operators in Pig Latin. We are going for the join to join two or more relations
4:01
We're having co-group. So group the data in two or more relations
4:06
So that is our co-group. We are having the operator that is of a group to group the data in a single relation
4:14
And then we're having cross. There is a cross product between the tables
4:18
We're having order, arrange relation in the short rate order. We're having the limit to get the limited number of tuples
4:25
Not all the tuples we are going to get the limited number of tuples
4:29
Union combines multiple relations into one. And then we're having the split, split one relation to multiple relations
4:37
So, Indian we are having, that is, to get to combine multiple relations to one, but here
4:42
in case of split one relation to multiple relations. We are having the describe, there is a describe the schema of a relation
4:51
So these are the different relational operators which will be available in our PIG Latin
4:56
And in this way we are concluding the PIG Latin basics
#Programming