MapReduce and Design Patterns - Chain Folding Pattern Example
3K views
Oct 18, 2024
MapReduce and Design Patterns - Chain Folding Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are going to discuss chain folding pattern example
0:05
So we shall go for one practical implementation of this concept. So in this example we will provide posts.xml and also users dot XML
0:15
These two XML files will be there to separate the users where the reputation is above
0:20
3000 and below 3000. So this is the assignment we are going to have and going to implement in our next demonstration
0:29
using Java coding and how to run it, how to get the output, will be showing that one also
0:34
in the demonstration. In this example, we are discussing chain folding pattern example and it is falling under
0:43
the meta pattern design pattern. So here is the problem is that we are supposed to separate users where reputation is greater
0:52
than 3000 and also below 3,000. So in this way we shall have to separate them
0:59
are going to have two XML files one is the posts dot XML another one is the users dot XML
1:07
This post dot XML is under the folder slash input slash post and users dot XML will be under
1:14
the folder slash input slash user. So these two XML files will be using and here you see we're
1:22
having this slash input slash user posts dot XML and let me go for slash input slash users
1:29
users.xm. So let me show you the respective file content. So at first we're going for
1:35
users.coml under the user's tag we're having multiple number of rows, two of them I have
1:40
shown and each and every row has got multiple attributes like your ID, reputation, creation date
1:47
display name and so on. So these are the multiple different rows will be there in the main
1:52
users.coml. So now we shall discuss what will be the current content of posts
1:58
dot XML under the posts tag we're having multiple row tags with multiple attributes
2:03
ID post type ID accepted answer ID accretion dates code and so many other attributes are
2:10
there so this posts dot XML will be also having multiple records now we shall
2:16
concentrate on our Java program this Java program will be having only a single
2:22
class only one Java program we shall be happy and here we'll be having the
2:26
respective two mappers and the respective reducers so here the name of the Java
2:32
class is our chain folding MR task that is the name of the Java class within
2:37
this Java class we have defined multiple public static final strings that is a
2:42
average calculation group multiple outputs below 3,000 multiple outputs above 3,000 so there are multiple final strings are defined and they have got
2:54
initialize with this so you see multiple outputs below 300 3,000 so below
3:00
3,000 whatever the value and above 3,000 whatever the value we put against
3:05
them will be creating our folders now we shall go for one inner class that is
3:11
the user count mapper and this user count mapper will be extending the map reduce
3:18
base class and implementing the mapper interface so mapper interface implementing means it must be implementing the map method within that we have
3:27
defined one static final string that is a records counter name initialized with
3:32
this records that is a record records counter name initialized with the records it
3:36
is a final that means constant also at defining private static final long
3:41
writable one is equal to new long writable one so here it is one will be
3:47
containing one there and also the output key of the type of text so these are the
3:52
variables are defined within this class and now within the within the map
3:57
method we're overriding the map method we're having XML parsed which will be
4:01
instantiated by the output argument of XML to map function this function will
4:06
actually convert the XML content to the hash map object and that has been that
4:11
will initialize XML parsed now from this XML parts we supposed to get the at get the user ID that is the owner user ID will be getting this one and that will initialize the user ID here
4:23
So get user ID from the posts. So if user ID is equal to null or is not equal to null then we'll be setting that one to
4:31
this output key and output. That is a output key and one
4:37
Now we are going to have our map reduce. So extending the map reduce base class and implement
4:43
the mapper interface so this is the respective class here so that is a user
4:48
reputation mapper so in this in this class we're having one output key of the
4:53
type of text and one hash map object that is a user reputation map and which
4:59
has been instantiated the key value pair will be of the type of string and string
5:02
we're having this method configured we're overriding that one it is having one user
5:08
reputation map. clear we're clearing that reputation map and then we are creating
5:13
one uri one uri array name of name of that is files that is the uri array which is
5:19
being instantiated with this distributed cache get cache files the respective one has been
5:25
replicated but it is working fine in our code and job will be the input argument
5:29
so if files is equal to is equal to null or the files dot length is equal to is equal to
5:34
0 that means that if the array is empty in that case we are throwing one exception
5:39
otherwise read all files in the distributed cache so for uri URI URI for each
5:46
contain in the files the files array and we'll be defining path P is equal to
5:51
new path URI file system fs is equal to file system dot get job so they are
5:57
getting instantiated now to read data from the file we have created one buffer
6:02
reader object RDR which will be which will be actually required to read data
6:07
from the file now within this while we'll be having this XML parse again initialized with this XML to map
6:13
which will convert the line from the line it will be parsing that one and that will be
6:18
initializing this XML parsed within this while we will be reading line by line
6:23
So users has been initialized with this get ID reputation has been initialized with this
6:28
Excel parse reputation. Now we shall go for one if if block that if user is not equal to
6:36
null and reputation is not equal to null. user reputation map dot put user comma reputation so we are putting that one in the
6:43
respective map user and reputation and it has been enclosed within the try catch block
6:49
properly so I think we're getting my point how we are doing now you know that we are
6:54
having this map interface so map method has to be overwritten here so here we're
6:59
going for that is a reputation is equal to user reputation map dot key that is a key to
7:04
string we're converting this key to the string and that will be assigned to the string
7:08
reputation and we will be having if the reputation is not equal to null then
7:12
output key will be having this there is a key then tab then reputation and
7:17
output dot call it output key and value so key value period will be kept in
7:22
this output. So in this way the map method body has been written and the value has been passed
7:27
as input argument. Next we're going to have our map reduce base so we are in inheriting that one
7:35
are also implementing the reducer. interface in long sum reducer so here we're supposed to we're bound to write or
7:43
define the reduce method body so here I and t sum is equal to zero while values
7:50
dot has next if the if next values are available sum plus equal to values dot next
7:54
dot get so getting each and every value and getting it added with the sum so
7:59
output value dot set sum so some will be kept in the output value and output dot collect key
8:04
output value so we'll be writing the key value pair onto the output
8:08
So in this way the reduced method has been defined. So now we shall go for another, there is a mapper
8:19
There is a user bin mapper. We'll be going for the user bin mapper. After writing this Q and value pair will be going for user bin mapper
8:26
So let me go for that. And this user bin mapper is extending this map reduce base and implementing the mapper interface the configure method so that is a multiple outputs multiple output object is MOS has
8:42
been initialized with null and so within this configured we're writing MOS is
8:46
equal to new multiple outputs configuration we are passing the configuration as
8:51
input conf and we're bound to override this map method because we're
8:55
implementing this mapper interface so within the map method we're writing integer dot parts I and T key to string dot split slash T one if it is less than
9:07
3000 that means we're converting this key to string and splitting it with the tab and
9:13
going for the second argument so second value so if it is if the count is below
9:18
3,000 then we'll be going for a ms.gat collector multiple outputs below 3,000
9:22
reporter dot collect key comma value so similarly the same thing will be happening for
9:29
these multiple outputs above 3,000 so that will be kept into this moss dot get
9:35
collector now we are going for this close and we are just closing that one within
9:40
this try catch block as usual so XML to map we have discussed already so let
9:46
me come to this main function now this main function requires three parameters the
9:51
first one is the respective class name and then the post folder and then users
9:56
folder and then the output folder so so arc zero arcs one arcs two and otherwise
10:03
system exit two we're just initializing the post input from the argument CHEERING
10:08
but initializing post input from the argument zero user input from the argument
10:14
one and then output directory from this argument two now here we're setting
10:20
multiple mappers and reducers so at first we're setting up the set jar by class
10:25
there is a chain folding m r tux dot class here we're going for chain mapper
10:31
dot add mapper so here we are introducing three mapper we're adding and one
10:36
reducer we're adding so respective the parameters have been passed so we're having the
10:41
config user count mapper dot class long writable dot class text.class text dot class and wrong
10:47
writable dot class so respective values and key value peers everything the the respective class
10:53
types are mentioned the class names have got many. for the full declaration and there is a text class and the long writable class
11:02
and then new job configuration false so in this way you can find that we have
11:08
just adding this with this chain mapper we're adding this this user account
11:14
mapper dot class in this four lines we're doing all these things in this four
11:18
lines we're doing we're just if these four lines are there and the upper one the
11:23
four lines are there so now here we're going for another mapper that is here we did the user count mapper here we are going for
11:30
this user reputation mapper and then some long sum reducer long sum reducer
11:38
also we are passing the respective parameters so we're adding them in a chain
11:43
reducer now add mapper and another one is the user being mapper everything
11:47
we define in the previous case we can see that we have defined everything now we shall
11:52
go for this set combiner so here the reducer class and has been treated as a combiner class so config.set input format that is a text input format
12:03
format.com and text input format dot set input paths will be config and post input. So all the
12:11
variables whatever you have defined earlier they are being used. Configure multiple outputs
12:16
So set output format that is a null output format.class file output format set output path is
12:23
config and output dar which you got the from the third argument common line
12:29
argument add the name output and also the add name output so multiple outputs
12:35
above 3,000 multiple outputs below 3,000 and with the respective classes as
12:39
required these are the I'm doing the rights call you just check now we shall go
12:47
for the output key class will be text class and output value class will be the long writeable class at the user files to the distributed cache now so file status array will be the user files
13:02
is a file status array so file system dot get config user list status user input and
13:08
then from this user's file that is a file status array for each and every
13:13
status we're going for distributed cache dot at cache file status dot get path
13:18
2 URI comma config so this we're adding all this these cash files were adding here onto the
13:27
distributed cache now we shall check then now we shall check one the job that is
13:35
we're defining one job so after running all this adding this one the config we
13:41
defined earlier now we shall go for this job and the job client dot run job config
13:46
so job object running job object has been created if the job has not been
13:51
completed then wait for the five seconds and again check again otherwise we shall
13:56
after coming out we shall go for if job is successful then return zero
14:01
otherwise one so in this way the whole Java program we have explained line by
14:06
line whatever the variables we defined earlier they have been used in the later on
14:09
now the now the tax you know that is we are going to create the JR file that
14:14
means file package right button click explode and go for the jar folder and the jar file name we have done that one earlier so we are not
14:21
going to do that one I have shown you that one in the other videos multiple times now let me go
14:27
for this command that is the command to execute so hadup jar respective the jar folder and
14:35
the jar file name then you shall go for this package and then the class name then the input
14:42
slash input slash post which will be content the posts dot XML slash input slash
14:47
user which will be content the users dot XML and slash output where the below
14:53
3000 and above 3,000 those two folders will be below average and above average
14:59
folders will be created so we are executing so we know that here we are separating
15:04
users depending upon the reputation whether the reputation is above 3,000 or below 3,000 so that was our task in this
15:12
chaining folding chain folding pattern example so let me let me see that
15:17
whether those two folders have got created under the output folder or not and
15:21
then we shall see the part file contents and accordingly so the command is
15:27
executing I hope it will not show any errors yes it has got perfectly executed
15:34
let me come to this here so we shall we shall go for this output the output
15:42
this output is there so now we are having this above 3,000 and below 3,000 so above
15:49
3,000 and below 3,000 two of part files have got created so now we shall go for the
15:55
respective print so we shall execute this is DFS dFS minus ghat and those two file names
16:12
we are having under the output folder so that is above star I'm just putting
16:19
I'm not writing the full file name so here you can find the respective output
16:24
of that file under that output folder and now we shall go for this below
16:29
below start enter So these outputs have been have been shown here now let me delete this output folder
16:48
I hope that you have got this process how the process has to be carried out how to write the Java code
16:54
how to execute the command so let me delete this output folder I think now you are getting
16:58
confidence to run this example thanks for watching this video
#Java (Programming Language)
#Programming
#Software