This article is inspired to present the easiest way of building your own speech to text based call recording system.
Speech recognition is the process of converting spoken words to text. This technology has so many utilization possibilities, it made me wonder if I could build an application myself which is capable of such thing. I have done a lot of research until I have found a software development kit that has enabled me to finally start experimenting. This article is inspired to present my observations, and to give a technical review of making a softphone implementation uses speech to text based call recording. You can easily follow the steps through a sample program.
To
begin with, let's see the configuration steps.
First, you
need to download
the Ozeki C# Speech to text sample program.
Extract the
sample program into a directory.
Then, load
it into Visual Studio 2010.
In
the telephone initialization section of the PhoneMain.cs file
you need to replace the local IP address of the PC on which the
system runs instead of „your local IP Address”. Like this:
private void InitializeSoftPhone()
{
softPhone = SoftPhoneFactory.CreateSoftPhone("192.168.91.42", 5700, 5750, 5780);
softPhone.IncommingCall += new EventHandler<voipeventargs<iphonecall>>(softPhone_IncommingCall);
phoneLine = softPhone.CreatePhoneLine(new SIPAccount(true, "oz891", "oz891", "oz891", "oz891", "192.168.91.212", 5060));
phoneLine.PhoneLineInformation += new EventHandler<voipeventargs<phonelineinformation>>(phoneLine_PhoneLineInformation);
softPhone.RegisterPhoneLine(phoneLine);
}</voipeventargs<phonelineinformation></voipeventargs<iphonecall>
Search for the following
line:
softPhone = SoftPhoneFactory.CreateSoftPhone("your local IP Address", 5700, 5750, 5780);
...and replace the local
IP address of the PC on which the system runs instead of „your
local IP Address”.
You will also need
to provide the user data of your selected SIP PBX as the SIP account
object values. Similarly to the following line:
phoneLine = softPhone.CreatePhoneLine(new SIPAccount(true, "oz891", "oz891",
"oz891", "oz891", "192.168.91.212", 5060));
Finally, you only need to make a build and run
the program. Good luck!
To keep your attention, let me
introduce you the sample program's graphical user interface.
The program has been developed in Microsoft WPF (Window Presentation
Foundation) technology. Its GUI (see the picture) is simple but
representative (demostration's being the main goal of its existence)
with basic telephone functions (like setup calls, receiving calls,
sending and receiving DTMF signals).

GUI
Now,
for those who are more interested in the technical details, I will
shortly present the code.
PhoneMain.cs code-behind
file belonging to the program interface describes the control events
related to the interface and connects the GUI with the logics. It
includes the full logic of the sample program.
public partial class PhoneMain : Form
{
ISoftPhone softPhone;
IPhoneLine phoneLine;
PhoneLineInformation phoneLineInformation;
IPhoneCall call;
SpeechRecognitionEngine speechRecognition;
Choices voiceCommands;
List<string> SpeechWords = new List<string>();
bool inComingCall;
...
ISoftphone:
It represents a telephone, and
its telephone line is represented by IphoneLine. It is also
possible to develop a multiline phone.
Iphoneline:
It
represents a telephone line that we can register to a SIP PBX, for
example, Asterisk, 3CX, or to other PBXs that are offered by free SIP
providers. Registration is made via a SIP
account.
PhoneLineInformation:
It is an enum type
that represents the telephone line status related to the PBX. For
example registered, not registered, successful/unsuccessful
registration.
IphoneCall:
It represents a call: the
status of the call, the direction of the call, on which telephone
line it was created, who is the called person, etc.
OzPipeStream:
It
is an optional device and it helps process the incoming audio data
that comes from the remote end.
ozWavePlayer:
It
plays the received audio data on the speaker.
ozWavRecorder:
It
processes the audio data that comes from the default input device
(microphone) of the operation system.
ReceivedStream:
This
is the stream that saves the received audio data.
SentStream:
This
is the stream that saves the sent audio data.
Add these things all up and the
following will happen. After you run the program the telephone
automatically registers to the given SIP PBX with the given SIP
account. This makes the softphone ready to establish and receive
calls, to send and receive DTMF signals during calls for navigating
in IVR systems. After ending the call you will receive a notification
about the mentioned keywords. And now, you have experienced the
amazing technique of speech recognition.
Please take this little guide as an appetizer to
discover speech recognition and see the greatness behind it. If you
want to know more you should visit the site I was using as the source
of my article: http://www.voip-sip-sdk.com/p_135-how-to-implement-the-c-open-source-voip-softphone-implementation-of-ozeki-sip-sdk-for-speech-to-text-based-call-recording-voip.html
Hope this article was useful, do let me know your feedback by responding this article.
Thanks