MapReduce and Design Patterns - Job Chaining Pattern Example
12K views
Oct 18, 2024
MapReduce and Design Patterns - Job Chaining Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing job chaining pattern example
0:05
So in this example we will provide posts.xml and users.xmil. This 2 XML files will be required to perform two different tasks
0:15
And these tasks are chained together using our job chaining pattern. So let us go for one coding and implementation for the better concept on this topic
0:26
In this example, we are going to implement job chaining. pattern example which is falling under the meta pattern design pattern here we shall be dealing
0:36
with the two XML files posts dot XML which will be available under slash input slash post
0:42
and users dot XML which will be available under slash input slash user so here we are having the
0:48
post dot XML slash input slash post is a path we shall have another one there is a user users
0:55
dot XML under the path slash input slash user. There is the content of our users
1:00
dot XML you can find that under the user's tag we are having multiple rows and
1:05
each and every row is having so many different attributes ID reputation creation date
1:10
display name and so on. So let me also show you the posts.xml it will be also
1:17
within the post tags and multiple rows are there I have shown three of them and each
1:21
and every row has got multiple attributes. So So we shall open our Java program
1:27
In our Java program we shall have two mapper classes and one class is our user count mapper
1:33
Another one will be the user-beaning mapper. So let me open the Java program now and the Java program will be having only one Java program
1:41
here and that Java program will be having multiple such in our classes and other methods
1:48
are there. So job chaining MRTax. java. We are having two mapper mappers here as I mentioned
1:54
So in case of job chaining MR task where we are having some variables and they are all public static final string variables are there and they have got initialized
2:05
So average cal group multiple outputs above name and multiple outputs below name
2:11
So these are the respective final string variables. We're having the first mapper that is the user count mapper
2:20
This user account mapper extends mapper which is having one. also it is also having one static final string that is a records counter name
2:30
initialized with records also one final wrong writable object that is one there which is get
2:38
get initialized with one that is equal to new long writeable one having the output key of the
2:44
type of text now we are overriding the map method within this map method we are having the
2:50
map object that is xml parsed these xml parts will be be initialized with the output argument of XML to map method this method will take
3:01
one XML and returns that is respective hash map object XML parts dot get owner
3:07
user ID that will initialize the string variable user ID if the user ID is not
3:13
equal to null then output key dot set user ID and then we're writing context dot write
3:18
output key comma 1 so that is a key value pair we're writing and context dot get
3:23
counter average calc group records counter name dot increment one so their values will
3:29
be incremented by one accordingly so gate counter will increase the value by one now
3:35
we shall have the try catch block enclosing this map record body let me go for
3:40
the another reducer here so here we're having this user count reducer having this
3:46
static final string users count and name and long writable output value which has
3:52
been instantiated Now we shall override the user method here. The user method is available under this user class
4:01
And so Reducer class will be overriding this Reduce method. So in the Reduce method, we're having get context
4:09
Get counter, average calculation group and user counter name. That increment by one
4:16
So here, the respective counter will be incremented by one. now what I'm doing is that I'm just making one integer variable sum initialized with
4:26
0 we are having this initial initial is with 0 then I shall go for long writeable
4:31
value within values so now whenever we are going for these values it is the
4:36
iterable objects so we can iterate on the on it so sum is equal to sum plus
4:41
value dot get so say we're going to get the find the sum of the user IDs and
4:47
output value will be set with the sum and in the context we'll be writing the key value pair so that is a purpose of
4:54
our reducer class I told you that we'll be having two mappers so next mapper
4:59
is our user-beaning mapper extends mapper within this we are defining one
5:05
static final string average posts per user initialized with avg dot posts
5:11
part dot user and one random class object and instantiated with the respective constructor So it is the abg post we are going to have the method that is a set average posts per user set
5:29
average post per user. So this particular method is returning void and it is having only one single statement that
5:37
is a job. .get configuration. .set average posts per user and double two string AVG
5:45
here in this particular abg I'm just converting it to double to the string before
5:50
writing it onto the job gate configuration so from double to string we are having
5:55
another method that is a get average posts per user and here you can see that
6:00
double dot parts double so conv dot get answer posts part user so in this way now
6:06
the string to double will be converted so there we did double to string and
6:10
here you are converting it to and that is the string to double back again
6:15
We're having some variables are there, private double abg. We're initializing that one using zero and then multiple outputs, text text
6:25
mull of is equal to null, text output key is equal to new text and also the output value is of the type of text
6:32
So you can find them. We're having this hash map. There is a user reputation map is a hash map object
6:41
and it will be containing key value as string and string type. We're just writing the method, say overriding the method setup
6:50
In the setup method we have just defining this AVG and MULOP, initializing them with
6:56
the get average posts per user, context. Dot get configuration and also new multiple outputs, text, text and context
7:05
So in this way this ABG and MULOP has got initialized. We defined that one earlier, but here you have just initializing it
7:15
try catch block we have defined the u ri uri array there is a files get context
7:21
or get cache files so it is just getting instantiated so there is a uri
7:26
there is an array the name of the uri object is our files now if the files
7:34
is equal to null or the file dot length is equal to is equal to zero that means if
7:39
the array has is null or array length is zero then throw the runtime
7:42
exception with the message that user in information is not found in the distributed cache otherwise you are just defining one
7:49
file object user file lock is equal to new file here the path we have given as dot slash
7:55
users we have defined one buffer reader object that is a buffer reader which will be
8:01
reading data from the file which will be reading data from the file that user file lock
8:05
string line now we are executing one while loop and within that we are having the
8:11
try catch block so within the try block we're having the parts again we are
8:15
calling that XML to map which will be reading the which will be parsing the line and
8:20
that will be instantiating the parsed that is a map object string user ID is
8:25
equal to parse.gat ID so from the ID key we are finding out the initializing the
8:31
ID to the user ID and parse dot get reputation from the reputation we are just
8:36
updating the string object reputation in this way you are just initializing these two
8:42
string objects so if user ID is not equal to null and reputation is not equal to
8:48
null map the user ID to the reputation so user reputation map dot put user ID
8:55
and reputation so here we are gathering the user ID and here the reputation
8:59
also we are gathering and that will be put here so here you can find that we
9:04
have defined this user reputation map already we have done this one so this is
9:09
the respective content you can find it so what we are shall what we shall do here you can see that here we shall go for in the map
9:17
method we shall go for this user ID will be taken from the token zero token
9:22
zero means the first argument because the token is a string array so it will be
9:26
splitted with this tab and park post count is equal to integer dot parse and token
9:32
so 1 that means a second second value in the tokens because token will be
9:36
splated depending upon the tab so output key dot set user ID
9:42
user reputation map dot get user ID if it is not equal to null then that means
9:49
it is having some value so output value dot set we're converting this post count to
9:53
long then concatenating it are with a tab and then user reputation map dot get
9:59
user ID so in this way we're just setting up we're just putting the key value
10:03
pair in the respective output value so there is a value we are getting a
10:07
putting on this output value just doing this concatenation just doing this concatenation separated by the tab so if else if the user reputation
10:18
map dot get user ID is equal to is null then integer reputation is equal to take any
10:22
random value within the range 10,000 and then output value dot set again we are
10:28
converting the post count to long separated by tab and reputation to string so reputation
10:34
will be converted to the string so they will be concatenated that will be put on the output value if user ID was null Next if the double post count is below average double of the post count is below average
10:46
So here we are creating two output files. We'll be creating two output files there
10:51
So that is our, that is a mallop dot write multiple output below name
10:56
So there is an output key, output value, multiple output below name, plus part
11:02
So slash part, we are writing that. We're just doing the concordination with this. multiple output below name. Similarly we are doing the same otherwise but here we'll be going for
11:10
multiple output above name and again we are doing the concordination with the part string slash
11:16
part string we're doing the cleanup just closing this uh mall up we're just closing that one
11:25
now let me discuss the main function within the main function we require three parameters
11:30
to be passed so we're taking the common line argument we'll be having the posts
11:35
that is our directory we are having the users directory and also the output
11:39
directory three arguments and before that job chaining MR tax is a class
11:42
name so post input path will be obtained from arc zero post user input path will
11:48
obtain from the from the arg one as we passed and output directory
11:53
intermediate will be the the path there is argument to plus underscore I
11:59
Nt so for intermediate I'm gone for this underscore I and T and output directory will be
12:05
initialized from this arcs 2 that is the third argument will be passing so in
12:10
this way from the third argument will be passing that one now defining one job
12:14
one job here job instance and here the job name will be job chaining
12:19
counting so that is a name of the job and we shall we shall define the our
12:25
Jure by class that there is a JR file jar class name then mapper class so it is a
12:31
user count mapper then combiner class that is a long sum reducer class that is a combiner from the Hadoop library we're taking that one
12:40
and also the reducer class will be user count reducer.class and then we'll be going for set
12:46
output key class that is a text type and set output value class will be long writable class
12:53
So everything we have defined as we do. Set input format class will be text input format
12:58
class. So now we are going for this at input path that is a counting job which you define
13:04
earlier there is a counting job there is a job instance and that we're having this post
13:09
input path so all these variables we defined earlier so now counting job dot set
13:15
output format class is the text output format dot class and set output path is
13:20
counting job comma output directory intermediate we're just checking what is a code
13:27
here the if the completion is true then the code will be zero otherwise the
13:32
code will be one when this is particular code will be zero that means the completion has been done then we shall go
13:39
for then we're going for for this if block so double record count record count is
13:45
equal to double counting job dot get counters dot fine counters average
13:51
group and then user count mapper dot records counter name and dot get value so i'm just
13:58
highlighting so that you can fill so calculate the average posts par user by getting the counter
14:04
values so by getting the counter values so here we're just initializing this
14:09
record count and this is a record count we are passing this respective
14:15
parameters and then dot get value similarly you'll be going for the user count also
14:21
will be going for the user count and that is a counting job dot get counters find
14:26
counters average call group and other parameters and dot get value so the average
14:31
post will be record count by user count So average post will be record count by user count
14:38
So there is an average post. So that's why you give the name as job chaining and binning
14:43
We give the name earlier. So now we are going for this job
14:47
There is a binning job and now we shall go for this set mapper class for this winning job
14:53
And also the set average posts per user. Just we are just defining all these things, setting the respective, respective parameters
15:01
So setter mapper and the average cost per user. Reducer tax here is zero. We don't require anything. Set input format class. Add input path. So these are the mandatory operations we're doing everything we defined earlier. They are being used here. Then you shall go for multiple outputs. Add name output. So binning job is the name of the job. Multiple outputs below name. And we'll be having the text output format. Class. Text. Class. So here we are passing the respective parameters, whatever you require for
15:33
add the name output so these are the other parameters I'm just showing scrolling
15:38
on the right see what we passed there all that all that corresponding class
15:44
names we have mentioned and then multiple outputs dot set counter enabled
15:50
winning job true so now I'm just making this counter enabled it will be working
15:55
on the job text output format set output path winning job comma output dar so there is output directory whatever we have mentioned that or that one so file status file status user
16:07
files is equal to file system dot get config list status user input path so we are
16:13
defining one file status that array that is the user files and form these user
16:18
files we are picking up each and every status there so file status status we are
16:22
going for the fall loop so within the this follow we are just going for winning job dot add cache file so new eri uri status.g get
16:31
path to uri i'm just converting this one to uri and also we are adding with this hash user so
16:38
that will be added there and that will that will be added into the beining job dot at cash
16:44
file so we are going to define this this is the respective the arguments whatever you are passing
16:51
to this uri constructor so now we are going to have have the code depending upon the completion if the completion is perfectly okay
17:00
then you shall return zero otherwise one then you shall go for this delete
17:05
output dot intermediate true we're not keeping this one intermediate folder we're just deleting that one and system dot exit code so in this it is a long
17:14
program you have seen that now we shall create as usual we'll be creating the and the
17:19
jar file so jar file can be created just right-clicking on the package name and then
17:24
going for export and then keep the jar file name and the path already we have created the jar
17:29
file so we can skip the step here we have shown how to create the jar files in other videos
17:34
so just package right button click export and jar and jar folder select and go for next next
17:41
so now we shall go to execute this comment so let me go back to the console and let me show you
17:47
that how the program can be executed so let me go for clear at first yes
17:54
now we are typing the command so we shall go for this Hadoop looking for this jar
18:02
and then the jar path so that is the map reduce underscore design pattern
18:09
jar files and then we'll be mentioning the jar file true whatever you gave
18:14
so that is our meta pattern dot jar then you shall give the package name and then we shall give the class name package
18:28
name dot class name so you shall give the package name now and dot class name
18:39
then job chaining MR tasks a mark task okay now we shall give the
18:52
respective the input path is a slash input slash post other input that is a slash input
19:06
then the output path it's a slash output so you are executing this one so here
19:14
we're having two tasks working on this posts dot XML and users dot XML and later
19:21
the results will be merged so we are going to create two folders under this
19:27
output where the merge results will be obtained and that we are going to
19:33
demonstrate yes the program got executed successfully no error occurred so here we are having the respective the paths are there so if you go for the
19:47
output folder you can find here we have having two direct is above average and below average the part file got created is having the size
19:56
zero so going for the average at first see it is containing part file there one
20:02
part file and below average is also containing another part file so we shall
20:07
go for the content I'm going to show you the content So, one is the above average slash part star
20:35
See here we are having this above average file part file content
20:44
So now we'll be going for below average. Same way. So I think you have gone through the steps, whatever you have followed
20:51
we have explained the program line by line I think we have got this file ready
20:56
this is the other files content that is a below average and now we shall delete
21:02
the folder as we do in earlier cases also I think we have enjoyed this
21:07
video and got the clear concept on it let me type output folder is now getting
21:15
deleted thanks for watching
#Data Management
#Java (Programming Language)
#Programming