CTP project and CISC4900 Presentation, Signalyze

Name: CTP project and CISC4900 Presentation, Signalyze | Humix Video
Uploaded: 2025-01-01T09:54:03+00:00
Duration: 30 min 53 s
Description: Our repository: https://github.com/WillSnakeTaka/Signalyze For more info please visit: https://www.mysteppe.com/category/music-releases/ Spotify: https://open.spotify.com/intl-de/artist/58kAMATJwufWYn7yp0hIk4?si=V023A9ISR2qaNRwezn_rPg SoundCloud: https://soundcloud.com/gav-khugzhem-taka SuperLink: https://ffm.bio/w5q94qq

0:06
hi Welcome to our presentation for 4900 today our topic was a shaliz it's a
0:13
program it's a project we did throughout semester our teammates are Marco Chen
0:20
Regal B pun and me so our project overview is uh Liberto
0:27
doing ta spe and my uh teammates Marco Chen doing the gestra hand handun Mark
0:33
detection sections so here is our uh recapturing of our uh presentation
0:40
nights and this is [Music]
0:50
us all right now let's start with the
0:55
presentation so we first go through planning [Music]
1:02
Toto work on the TA of speech documentation and recommendation for project huging face and uh stream
1:09
streamate testing for test text to speech and project suggestions the marro
1:15
worked on open CV sign language training data collection local testing and implementation on streamlet I'm working
1:22
on smaller project such as uh H5 training to testing different platforms
1:28
for usability and put the project together integrating text to speech open
1:33
CV and also working on exploring liosa for future a
1:40
augmentation so today's presentation format uh going through like this first I will introducing a voice processing uh
1:48
testing so this is the most important part of my learning where I learned all the stuffs from this uh video processing
1:56
for music and also for How to Train H5 file a train different datas and the
2:02
second stage is like I going testing liosa and it's uses on music analysis
2:07
which will lead to my understanding of a voice processing which later on uh help
2:13
me integrating this text of speech and the uh for the integration of the sign
2:18
language and text of speech and to understanding basically how this a of speech things and the audio handling
2:25
work so the third step was introducing image processing and then video processing and then talk about platforms
2:31
and failures and the last last steps we will present our project and also a
2:37
recapturing of our presentation night with my teammates so now let's introduce my work
2:45
and for the project so for my part my contribution was a planning so uh rerto
2:51
and M have formatting uh documentary to listing what do we need to use for this
2:57
project and I'm going to plan and and making smaller project to make each step work and then we put the things together
3:04
similar like how do you build this uh like puzzle things so the first thing is
3:10
understanding basic steps and once that we understanding the basic concepts we can finding different resources and the
3:16
coding and testing and then the testing platform including Unity flask hugging
3:21
face Streamlight a uh Amazon Cloud and also render heru and also uh various
3:29
different platform per for implementation experimenting on liosa and currently in progress and these
3:34
things uh make it work tonight actually it's the small Personal
3:40
Achievement so the problem statement the sign language is a crucial communication
3:45
tools for the people with death and hard hearing communities but many people are
3:50
not fluent in it so brezing this Gap through technology can enhance this incl
3:56
inclusivity and accessibility so that's why we come up with this uh idea about
4:01
using open CV to recognizing hand system so our solution is first we
4:08
process a system that can capture shine gestures and using a camera and then recognize gestures through machine
4:14
learning models and then convert this gesture to different text and text to spe in real
4:20
time so the technology we use is open CV which was like a real time video
4:26
processing uh app and also media pipe was really helpful like introduced by
4:31
ham and also it's hand tracking and gesture recommendation system penser flow and kalas is uh handling deep
4:39
learning and model development and finally we're using python libraries to
4:44
uh using this txos spe and NP and also flask so my first step starting with
4:52
learning something small so handling WS is the most important part because like we have to uh go through one dimensional
4:58
to three dimensional know so the voice is the basic part and then the image using this uh CNN formats and different
5:06
formats and then after uh voice and the image we go through video and integrating how video and the voice can
5:13
integrate together so for me music study is part of my speciality like in this
5:18
small project so I using music study and voice handling inspired by Ham's
5:23
repository for emotional Rec recognization system to start this project first let's go through uh Ham's
5:31
uh really amazing report story so husam is our teaching assistant which has been
5:37
really helpful and he's so great so they are doing something using machine
5:43
learning and librosa to recognize different speech and generate General uh generalizing emotions so using this
5:50
emotion to detect uh other voices using this machine learning model and what I
5:55
learned from this uh really amazing model mods like they using inspect and explore data and selecting
6:02
engineering features build and train models evaluate models so basically they uh import
6:09
different libraries and using machine learning to learn things and they have this P PBE to presenting audio segment
6:16
audio that can be manipulated using using python code the first step is inspect explore
6:24
data so uh they they training with really great data like from K but for me
6:30
was like I want to learn something about my own music so I doing my own style of training I will show you later so
6:37
importing data size is another uh issues because like they always have formatting issues things and preparing data and uh
6:45
revness redness so this this part was basically dealing with uh data labeling
6:51
data I think data cleaning and data like preparation but for my part I don't
6:56
really need those things because I label my own uh data I know was my emotion so I put into different folder and then
7:02
just check later and then preparing data for creating this data sit this is another
7:10
stat and after you labeling different things for example female male and neutral emotion and all those uh things
7:17
then you replace read uh like replace different kind of like type of things
7:22
for example replace female with female and these type of things but for my case I doing with small data and my own data
7:29
so it doesn't apply so their project is more ambitious but our project is a little bit smaller so yeah then they
7:36
using V uh data visualization currently I don't have this uh feature yet but I
7:42
my my my my own data was only dealing with six different prototype I will show you
7:47
later yeah so basically they are using liosa domestic uh do domestic Learning
7:54
System to mapping this different type of uh waves and then you you learning this
8:00
different type of things for example H and also like the the seconds so they can study these
8:08
type of things uh like to recognize a different partn they haven't seen before so use these things to predict things
8:14
and this is really great system and for this this spectrum things I already seen
8:19
the things during Professor cohan's class with electronic music L master so
8:25
this is also another very interesting thing in learning different sound
8:31
Yeah so basically this is the idea and this is from Ham's temp Ham's repo I learning extensively and now let's going
8:38
back to this learning process so the learning uh FF ffmpeg is another uh
8:44
formatting thing so this formatting can convert in between MP3 to waveform because waveform is bigger not bigger
8:51
like more specific MP3 is smaller and then I also learning so many things with
8:57
profess Cohen on this Timber things and how how Tempo things and how this uh electronic things going through this uh
9:03
other things I forgot what's the software was the software that you can paste uh track there and they reflecting
9:09
different like uh pictures and also wait sorry labeling my
9:16
own data is another process I label my own data I'm going to show you my basic data labeling things going on
9:24
so for my small project I have this uh music data so all this music I pick up
9:30
different music I compos during the past three years cuz I know those styles are unique to me so I label them as as a
9:37
different labels and then put this uh music sty in there to learning from so that's how I label my own data
9:45
and then supposed to be generating a CSV format to uh recording this data point
9:50
for example they're using different software extracting Temple and the chroma and also those like picture line
9:57
things and then up after this process we can generating a H5 machine learning
10:03
model and they using this model to predicting other things so now let's going back to topic
10:09
so what's is ffmp eg format ffmpeg is a
10:14
powerful open source tool for handling mtim media data it can convert audio and
10:20
video formats and then resample re-encode and normalize this audio file so basically for my case it's
10:27
like converting between MP3 and wave because like the wave s format was better with a better quality and it's
10:34
easier to extracting this uh music information so for handling unusual
10:41
audios yeah there is like when we dealing with the data learning sorry
10:48
when we dealing with this uh machine learning and data labeling there will always be this uh conflicting things and
10:54
they were resulting the fail is not reading into C CSV or don't recording
11:00
correctly so for me was PL to false read all fils and there's a process I learned
11:06
called fallback mechanism it's like if a Fisher have this Temple mfcc and chromas
11:12
basically they are different wave recording different wave things if it it fails and has unexpected Dimensions this
11:19
Dimensions things that uh in my best knowledge is like array so for example if Temple only have one array and then
11:26
the other format have two array so this things will not match so I have to force into some data into it so for my
11:33
solution is like for any problematic fil inro output has consistent Dimensions
11:39
this is the first issues and if U something for example like electronic music or from some of my music have this
11:46
really weird tones they will not be recognized then I put it as zeros on alternative placeholder so this zero is
11:54
still valid because um when they read through this uh datas they will still recognize a part part and use it use it
12:00
to predicting other partn and no no skipping so sometimes they skip these fils when it's not working so we have to
12:07
write the code saying this loog issues must be like a continue processing every
12:12
fil to make sure every fil have recorded something even if something was not showing there they put a zero there
12:18
instead of uh like skip it skip this whole fil to mess up this whole
12:23
fils and the next stage is called debugging the debug debugging fails usually they have this uh uniform shape
12:30
for all features so it's basically they have Temple mfcc chroma is reshaped to
12:37
fix this dimension for example Tempo has one row and mfcc has 13 rows and chroma
12:42
has 12 rows so I will skip this uh process this is another like kind of like personal discussion so this is my
12:49
backlog so the biggest issue I might was this uh like something called dimension thing because for example these things
12:56
when they uh when they produce reading my uh one of the piece Professor zerio
13:01
like reference or experimental things they don't have the same dimensions but array in the index zero has two
13:08
dimensions and the other has one so I have to that's why I went through this process to f everything through and
13:14
eventually this thing more rate for example they say Temple shape is one 1
13:19
5,000 and the mfcc shape is 13 5,000 the chroma shape is this like all of these
13:25
things will give you a number and this number is supposed to convert into CSV
13:31
but CSV failed I'm going to explain why because there are fail types the npy is a binary fail for my specific to NP and
13:38
also optimized for strong arrays for metadata for example this strong arrays and this is why I'm going to skip this
13:45
uh thing you can read [Music] later so for analyzing why my CSV file
13:52
didn't work so why it didn't work because our features have multi- dimensionals for example 26 5,000 which
13:58
is a CS can't handle this uh things like by itself CU CSV was like one dimension
14:04
so for flattering those arrays into a CVS would lose the structure integrating in this data and complicate processing
14:11
and model training so that's why we are using NP NP structure inside of this
14:16
structure and Mark also come up with other different structures which is really great for his final hand
14:23
recognization system from media pipe so now let's go to next micro stpe
14:29
now we produce our first uh npy sorry npy format this suppos this just imagine
14:37
this is similar as CSV so they listing all the stas like which is this this
14:43
this form what is this point what what number is it and we can now train H5 to
14:48
recognize those parns now we plan for training the
14:54
models the training in the models first were load and uh the data site load py F
15:00
and separated features and label them and then we train them split into training and testing side and after
15:07
these things we'll be able to produce H5 file so how it works basically the motor
15:12
learns pattern from these input features and those features capture ke characteristic my music style such as
15:19
Rhythm tone and Timber and all those are comp compiled from this chroma uh
15:26
packages so this is my six different I call it traditional centin and crab
15:31
class Canon Innovative existence BL and the reference so experimental so each
15:36
have the uh different ideas they doesn't exist in any other sty so after learning
15:42
my own partner they will produce six stells labeled at 0 to
15:47
5 now I make it into an h55 fil so now we're going through this training
15:53
process using neural network map mapping to capture this uh part
15:59
[Music] so now we have this H5 file using my uh
16:04
CSV which was npy file like the two dimensional things and then they will give you a H5 file this H5 f is the most
16:11
important part of any modor training because like usually other people train from tens of thousands files into this
16:19
H5 file so people can borrow this model to train their own model and this is uh
16:24
I learning how to do these things from scratch because I really like uh Ham's
16:29
idea like back then we don't have huging face we don't have all that we have to stream by ourself this is why uh I
16:35
really like this uh things so I start to train start to train things by myself so
16:41
not every h55 will end up in success so there was only two working one is this
16:47
the other one was the handshake I'll show you really quick now let's testing these things on streamlets
16:55
so here is my project let's show let me show you [Music]
17:12
see now they will show you this six sty right and then you drag a random video
17:18
with a either MP3 or wave and they will analyze a wh of this six sty integrating
17:24
with each other for example let me put my own music there
17:30
put something run regular
17:37
see now it's analyzing based on my H5 [Music]
17:46
file and what it will produce it predicting like we have this uh 7.4% of
17:52
this sty they were using this part to recognize which six St integrating with
17:58
other so they recognize my uh this piece was most close to this uhx there which
18:04
was 32 so this this is how how it end like ended so inov way of existence 36%
18:10
so they were learning how different music style like related correlated to each other based on this pattern and
18:16
this is my Personal Achievement on this small things sorry I forgot how
18:25
to yeah let me stop here
18:31
now let's back to the main project section two so the section two is image processing and media pipe for video
18:37
processing exploring media pipe and hand tracking and Analysis uh different like
18:43
data using Training Method the method we use is data preparation data training and real time
18:49
integration so the fail what fails is B the unity I will not show it here
18:54
because this on file is even more complicated this H5 H5 only have two success by myself usually people borrow
19:01
from other library but like other people's Library could be bed and you myself you okay so like my teammate H
19:08
really well but so for now I'm just going to talking about like U my contribution on this small micro project
19:16
so the main step in process is cleaning data and uses this is my friends doing my teammates doing this type of things
19:22
like did handle it really well and then Tex of speech uh do it
19:28
really well we have different formatting and I'm using uh Max native so now we
19:34
have backup plan backup my backup plan was to using Unity developing this like a number recognization system so we we
19:41
start from basic number to hand to video to V uh to audio and integrating all of
19:49
them together so now what does our first prototype do our first prototype is
19:55
input we capture using this media pipe for this finger points and then we learn
20:01
these finger point patterns so what's the potential from this basic prototype it can of course
20:09
learning a sign language helping people helping people with disability and also for music this sign language can help us
20:14
learning conduct comparisons and also for gaming we can using this gaming hand controller and for new music art project
20:22
some people already come up with these things and also can anise just play for fun like I develop game we'll show you
20:31
later so H5 versus Onix they are two different formats one was H5 one was onx
20:38
so the H5 hierarchy data format was using this tensorflow caras and onx was
20:44
like mostly for with unity bar things but it didn't work so I'm mostly using H5 in this
20:51
case the solutions and steps focus on cleaning data and testing real time
20:56
integration and refining the text spe and then you're using unity and finally you tou with flask flask is also Ham's
21:03
idea like ham is really great I so many great things we learned so many good things from
21:12
him the unity said that bakuda was something I failed I want to try more because Unity was better than other
21:19
platforms this is the bakuda testing and H5 what's is H5 as I show you before H5
21:25
is called hierarchy data format these things using the reading through different format for example c c CVS and
21:33
also the phy things like I show you before the two dimensional things I call
21:38
it like that user the use case is commonly used for tensor flow and kasas and also they can start different neural
21:45
networks ways training figures the onx format was better for uh
21:50
baracuda and also us AI P to sensor flow and different
21:56
platforms so how do we convert in between we creating a code converting between H5 and onx optimization this
22:03
things also good with GPU and mobile things there are also other
22:10
formats how do we converting different formats we converting from H5 to onx using this B no using this PP in style
22:18
these things and then we copy paste the code like this the code like this the T
22:23
onx export model input tens model onx and then we you we creating a converter
22:30
converting these things into the other format but these things always fail for me I don't know why but in the future
22:35
we'll continue exploring the flux was some best idea so I learning so much
22:41
with this flux I will show you really quick later as I show you I'm going to uh like
22:48
tell you how I set up this flask so basically flask a was like deploying a
22:53
flask a using different platforms so my solution was using uh rer and my
23:01
teammates also have this uh kind of like streamlets hugging face things now let's
23:07
Tes in different platforms so let me close one of them first
23:14
[Music]
23:36
yeah first let me show you about this uh
23:41
basic testing [Music] system first we test if this uh hand
23:48
shaking things working or not [Music]
24:15
wait wait wait wait this student supposed to be like this
24:21
[Music]
24:31
or I'm show you another one [Music]
24:51
sorry so for this project
25:01
yeah this project will show you this will have generating a different uh format and the guys they're playing a
25:08
guys in game with you for example this this hand this hand is the uh math and
25:15
this is the F and this is the math so they were playing something similar to
25:21
this uh Cesar rock with you and then they will generate randomly generate something so I also trained these things
25:27
one by one with uh using this media pipe to recognize recognize different hand marks and then I make 150 different
25:35
pictures to learning this things so this is my unique design on this similar to
25:41
this uh model par and recognization system now let's show sorry my computer is a little bit slow so these things
25:47
will be really uh unstable for example I'm using math you choose a mice you
25:54
win give a second will say mice is0 75 and then they will randomly generate
26:01
something see if you win or not you win now let's do hand you can't even move your hand like this you choose the fool
26:08
no I choose the fool computer wins give it a little bit time you choose the fool 100% this is easy to
26:16
this is just a p sign and now we have a cat let's see the cat was the hard one
26:21
you have curving finger no they choose you choose a cat computer wins see you choose a
26:30
cat it's a tie you choose a cat you win yeah this is basically what
26:37
happened I'm stop it just in case draining my computer and they were
26:43
generating here so now for the last part I would
26:48
like to show our final project at the presentation night and thanks thankfully
26:53
my teammates uh RTO and Marco CH and me was working on these things so for the
27:00
last I would like to show our recapturing for maros Shen's work so here is maros
27:07
[Music]
27:12
Shen sorry now let's uh introduce our project
27:18
you can say something so uh actually our projects are related about um the comp
27:26
computer about on the data science so basically we uh use the uh medium P
27:33
which is a uh uh open source library from uh uh Pyon uh which is relate to
27:40
the uh compation recognition so um it's providing a a model that uh we can use
27:49
our own data set to chain it to recognize different uh label So based on
27:56
those label we put in the JP picture or MPG picture in those label and then put
28:04
in the model to change it after we change we can also decide like how
28:10
precise we want and also we can well turn the model in the future to work
28:16
small side but for this project now we did is we change it we use the data set
28:22
uh the ASL side it a to c to recognize
28:28
all that hand so uh now I'm going to show a demo on
28:34
the here on my so I'm going to side the light camera F so basically
28:43
uh so uh oh that's perfect so uh we have a
28:49
side here you so first of all uh if put
28:55
our hand there's a landmark uh detect on the finger so it have like 21 phone
29:02
yeah card on your hand recognize is a hand shake yeah definitely you can do it
29:07
like like face shap something like that but now uh I can try to do a hand side to let the model weize my uh side
29:15
language so I do the a so it give uh the decided WID and also the how confid the
29:24
confidence how precise a model Rec this okay and then uh let's try two different
29:30
side now okay uh
29:37
[Music]
29:49
okay this is a s no this is s s like
29:55
this yeah that's e okay this a uhhuh this a this is a okay okay this is B if
30:04
you close the finger you close the finger that will
30:10
be okay and then uh if you make a finger cross that
30:15
[Music] be and you can try to be
30:22
and and then three finger is W okay that's great
30:29
the letter will only keep the record of the change cover
30:34
if your size keep the same it don't work so try to keep like less
30:42
M [Music] okay thank you that'll be all thank you
30:47
for uh watching our video hope you enjoy have a nice day

CTP project and CISC4900 Presentation, Signalyze

Gav.Khugzhem G.U.Taka

Unprofessional EWI play from Chopin to Scriabin. Blowlin Blowla Cellblow Trio(Need more practice)

Black Between Red and Blue (Unfinished Wrong Draft version)

Invincible Immortal Squirrelling Street Super Cat, EWI Flute Nyan Cat Headcanon #nyancat #flute

Meet Decktopus...Forget PowerPoint _ AI Presentation Maker

What’s coming to iOS 18 + Siri’s chat GPT overhaul! (CultCast #647)

ChatGPT's Topical Authority Blueprint: Establishing Yourself as an Expert

Video Lighting Tutorial for Beginners

Python 3 cartooner Library Script to Cartoonify Images & Draw Pencil Sketch Using OpenCV in Terminal

Python 3 Script to Build Mini Audio Player to Play Audio Files in Terminal Using audioplayer Library

Royal Family 1969 - Banned Documentary

Master Elite Communication: Speak Like the Top 1%

Using a Water Based Finish for the First Time

Up next in 10

CTP project and CISC4900 Presentation, Signalyze

Gav.Khugzhem G.U.Taka