How to build your own speech to text based call recording system

Posted by in C# category on for Beginner level | Points: 250 | Views : 9335 red flag

This article is inspired to present the easiest way of building your own speech to text based call recording system.

Speech recognition is the process of converting spoken words to text. This technology has so many utilization possibilities, it made me wonder if I could build an application myself which is capable of such thing. I have done a lot of research until I have found a software development kit that has enabled me to finally start experimenting. This article is inspired to present my observations, and to give a technical review of making a softphone implementation uses speech to text based call recording. You can easily follow the steps through a sample program.

To begin with, let's see the configuration steps.

  1. First, you need to download the Ozeki C# Speech to text sample program.

  2. Extract the sample program into a directory.

  3. Then, load it into Visual Studio 2010.

  4. In the telephone initialization section of the PhoneMain.cs file you need to replace the local IP address of the PC on which the system runs instead of „your local IP Address”. Like this:

    1. private void InitializeSoftPhone()  

    2.         {  

    3.             softPhone = SoftPhoneFactory.CreateSoftPhone("", 5700, 5750, 5780);  

    4.             softPhone.IncommingCall += new EventHandler<voipeventargs<iphonecall>>(softPhone_IncommingCall);  

    5.             phoneLine = softPhone.CreatePhoneLine(new SIPAccount(true, "oz891", "oz891", "oz891", "oz891", "", 5060));  

    6.             phoneLine.PhoneLineInformation += new EventHandler<voipeventargs<phonelineinformation>>(phoneLine_PhoneLineInformation);  


    8.             softPhone.RegisterPhoneLine(phoneLine);  

    9.         }</voipeventargs<phonelineinformation></voipeventargs<iphonecall>  

    Search for the following line:

    1. softPhone = SoftPhoneFactory.CreateSoftPhone("your local IP Address", 5700, 5750, 5780);  

    ...and replace the local IP address of the PC on which the system runs instead of „your local IP Address”.

    You will also need to provide the user data of your selected SIP PBX as the SIP account object values. Similarly to the following line:

    1. phoneLine = softPhone.CreatePhoneLine(new SIPAccount(true, "oz891", "oz891",   

    2. "oz891", "oz891", "", 5060));  

  5. Finally, you only need to make a build and run the program. Good luck!

To keep your attention, let me introduce you the sample program's graphical user interface. The program has been developed in Microsoft WPF (Window Presentation Foundation) technology. Its GUI (see the picture) is simple but representative (demostration's being the main goal of its existence) with basic telephone functions (like setup calls, receiving calls, sending and receiving DTMF signals).


Now, for those who are more interested in the technical details, I will shortly present the code. PhoneMain.cs code-behind file belonging to the program interface describes the control events related to the interface and connects the GUI with the logics. It includes the full logic of the sample program.

  1. public partial class PhoneMain : Form  

  2. {  

  3.     ISoftPhone softPhone;  

  4.     IPhoneLine phoneLine;  

  5.     PhoneLineInformation phoneLineInformation;  

  6.     IPhoneCall call;  

  7.     SpeechRecognitionEngine speechRecognition;  

  8.     Choices voiceCommands;  

  9.     List<string> SpeechWords = new List<string>();  

  10.     bool inComingCall;  

  11.     ...  

It represents a telephone, and its telephone line is represented by IphoneLine. It is also possible to develop a multiline phone.

It represents a telephone line that we can register to a SIP PBX, for example, Asterisk, 3CX, or to other PBXs that are offered by free SIP providers. Registration is made via a SIP account.

It is an enum type that represents the telephone line status related to the PBX. For example registered, not registered, successful/unsuccessful registration.

It represents a call: the status of the call, the direction of the call, on which telephone line it was created, who is the called person, etc.

It is an optional device and it helps process the incoming audio data that comes from the remote end.

It plays the received audio data on the speaker.

It processes the audio data that comes from the default input device (microphone) of the operation system.

This is the stream that saves the received audio data.


This is the stream that saves the sent audio data.

Add these things all up and the following will happen. After you run the program the telephone automatically registers to the given SIP PBX with the given SIP account. This makes the softphone ready to establish and receive calls, to send and receive DTMF signals during calls for navigating in IVR systems. After ending the call you will receive a notification about the mentioned keywords. And now, you have experienced the amazing technique of speech recognition.

Please take this little guide as an appetizer to discover speech recognition and see the greatness behind it. If you want to know more you should visit the site I was using as the source of my article:

Hope this article was useful, do let me know your feedback by responding this article.


Page copy protected against web site content infringement by Copyscape

About the Author

Full Name: Wilson Matthews
Member Level: Starter
Member Status: Member
Member Since: 6/29/2011 9:14:05 AM
Country: United States

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)