Ivelin Ivanov - blog: 08/01/2005

You are reading the second edition of this article. It was revised after a good amount of useful feedback was given to the initial version.

In a recent IM chat over Skype, Marco Moneiro and I were discussing various VoIP service delivery platforms. One of his comments struck me. He pointed out that Parlay has an easier to program API than JAIN SLEE, while SIP Servlets is about as hard as JAIN SLEE.

This was certainly intriguing to me, since we are implementing a JAIN SLEE container (the first and only open source certified implementation). I should know how it stacks up against the other competing or complimenting open standards.

Marco works at the Portugal Telecom research lab and has direct experience with multiple VoIP technology platforms. As an active contributor to Mobicents, he often shares valuable perspectives with the team.

So I took the time to look at some example applications and build out my own point of view. After hours of playing with Parlay and SIP Servlet code, I reached my non-expert conclusion, which more or less agrees with Marco, but with certain qualifiers.

Provided that each of the three technology standards have specification documents in the hundreds of pages, my thoughts below are merely scratching the surface of the problem and should be taken with a grain of salt. A lot of them are subjective as they are based on limited information and raw intuition. My opinion may change in the future as I gain more experience.

Telephony protocols and abstraction layers

Before we begin comparing the platforms, I would like to emphasize on a criteria that they will be measured against. Namely communication protocol abstraction.

Everyone seems to talk and write about SIP these days. It is no doubt an elegant, HTTP-simple protocol proposed by IETF that is quickly picking up momentum. However unlike the Web, where HTTP has been the predominant transport for UI content almost since the beginning of the Internet, SIP is just another kid on the telephony block.

Telecom services exist for over 100 years. You can imagine how much legacy has built up in the industry. It is a $600 (six hundred) billion global market, with enormous investments in technology and infrastructure. Add to the mix goverment regulations for service quality, downtime, and auditing. To expect that a new, even if brilliant, session initiation protocol will come in and swiftly wipe everything else out is...utopia.

VoIP represents a mere 5% of the telephony today...and VoIP has been available since the mid-90s when H323 was the packet based protocol slated to commoditize telephony. Well... the Web came to the stage around the same time, revolutionized the way we work and communicate, then crashed, then came back up. VoIP has still to make its point. How much have you heard of H323, how about http://? A recent survey by TNS confirms that consumer awareness of VoIP is still low.

Furthermore there are new emerging telephony protocols such as IAX (Asterisk) and Skype that aggressively capture developers mind-share. Examples of the influence are the new generation devices that directly support these protocols.

Here is a Skype phone on the market today:

.

And here is an Asterisk based PBX appliance:

I firmly believe that VoIP will be a significant part of our lifestyle in the near future. However it won't happen just by replacing existing infrastructure. Rather introducing new converged services, of higher value and lower cost that interoperate nicely with legacy systems will drive consumer interest and ultimately win VoIP a bigger market share.

To allow rapid development of new generation intelligent features, server side developers should not need to know whether the end points connect to the server via SIP, H323, SS7, IAX, Skype, or any other protocol. They should be able to focus on the logic that adds value for the end user.

Now that we covered the perspective on protocol abstraction, let's go back to the original topic and compare examples for Parlay, SIP Servlets and SLEE.

Parlay/OSA

Parlay has a natural programming model, which resembles the style of sequential code seen in desktop applications.

I used the Ericsson Network Resource Gateway SDK, which is freely distributed for research.

Here is a sequence diagram from working Java code implementing a Multi Call Control feature. Disregard the fancy name of the feature, it is actually pretty simple logic. If you are not familiar with telecom terminology, Feature is what a programmer would typically call a service. Take a look:

Feature.handleCall() is the key method in the application. It answers a phone call, asks the user to enter a digit, says something in return and routes the call to the next feature.

If the author had to write the same application for a text user interface, the structure of the code would be essentially the same. Start the app, open the console input, ask for a number, print a text line in response and forward the control to another routine. Plain and simple!

Here is the full source code of the feature:


/**
* Invoked by MPCCProcessor when a subscriber has dialled the service number.
* This method will reroute the call towards a destination that will be specified
* by the calling subscriber, or terminate the call if no valid destination is
* specified.
*
* @param aCall the new call
* @param aFirstLeg the one and only leg of the call
* @param anOriginatingAddress the address of the calling
*     subscriber
*/

public void handleCall(
TpMultiPartyCallIdentifier aCall,
TpCallLegIdentifier aFirstLeg,
TpAddress anOriginatingAddress)
{
// prepare playing announcements to the calling subscriber.
TpUICallIdentifier uiSession = itsUIProcessor.start(aCall);

// ask the calling subscriber to choose a destination
Object uiResult = itsUIProcessor.askDigit(uiSession,
Configuration.INSTANCE.getQuestion(), false);

// determine the destination based on the response
Configuration.Area destination = getDestination(uiResult);

// abort the call if no destination could be determined
// (e.g. because the user response timed out)
if (destination == null)
{
// no more user interaction needed
itsUIProcessor.stop(uiSession);

// terminate the call
itsMPCCProcessor.release(aCall);
}
else // a (valid) destination was determined
{
// give the user feedback on the choice that was made
itsUIProcessor.say(uiSession, destination.feedback, true);

// reroute the call
itsMPCCProcessor.route(aCall, anOriginatingAddress,
destination.address);

// cancel suspension of first leg
itsMPCCProcessor.continueProcessing(aFirstLeg);

// the application is no longer interested,
// so disconnect the association between Ericsson Network Resource Gateway and the network
itsMPCCProcessor.deassign(aCall);
}
}

Pros and Cons of Parlay

Pros:

Intuitive API
Protocol abstraction: hides the details of the communications protocol
Lends itself to unit testing. Features are well encapsulated in logically complete methods.

Cons:

Scalability

Footprint: When the application code invokes Parlay methods like askDigit(), the execution environment has to remember the call stack, carry out a complex intereaction with the end point and then return control to the application. Depending on the protocol the interaction with the user can be synchronous (e.g. TCP) or asyncrhonous (e.g. UDP). The voice packets might be routed directly or via proxy servers. In any case the askDigit() method invocation has to appear as a simple synchronous call to the application developer. Such a deep level of abstraction triggers some concerns regarding the resource consumption that each such call incurs on the execution environment. How many OS threads, heap memory, TCP sockets and other system resources are tied up until the call returns?

Fail over :If the server where the application is currently executing crashes for some reason, how does the execution continue on a fall back server transparently to the end user...hard. Possible, but very complicated.

Troubleshooting: In such highly sophisticated and complex runtime system, how does one trace bugs such as interrupted calls. How do you tell whether the reason is in the protocol layer, runtime engine or the clustering code? Since the application developer is so remote from the inner workings of the deployment platform how can s/he determine inteligently where the root cause is and communicate it efficiently to the platform vendor when needed?

I am sure the engineers who designed Parlay are aware of all the drawbacks enumerated above and have good answers. However having years of hands-on experience with scalability problems in enterprise systems I am having trouble seing comprehensible practical solutions. If you know otherwise, please let me know; Constructive criticism is welcome.

An interesting idea that is circulating in the community regarding Parlay is to implement its APIs as higher level services for SLEE. Working examples have not been shown yet, but it is a good mind teaser.

SIP Servlets

J2EE developers should feel right at home with the SIP Servlets API. This was a design goal for the authors of the spec and they did a good job at it. The most popular API within the J2EE stack is the Servlet API. There are many developers that are not familiar with EJB, JMS, JTA or JMX, but are quite comfortable with servlets. By induction, a vast majority of HTTP Servlet developers should be able to quickly jump onboard and start cranking out useful VoIP services. Right?...maybe. Take a look at the SIP Servlet API:

java.lang.Object
|
+--javax.servlet.GenericServlet
|
+--javax.servlet.sip.SipServlet

All Implemented Interfaces:: java.io.Serializable, javax.servlet.Servlet, javax.servlet.ServletConfig

public abstract class SipServlet
extends javax.servlet.GenericServlet

Provides an abstract class to be subclassed to create a SIP servlet.

This class receives incoming messages through the service method. This method calls doRequest or doResponse for incoming requests and responses, respectively. These two methods in turn dispatch on request method or status code to one of the following methods:

doInvite - for SIP INVITE requests
doAck - for SIP ACK requests
doOptions - for SIP OPTIONS requests
doBye - for SIP BYE requests
doCancel - for SIP CANCEL requests
doRegister - for SIP REGISTER requests
doSubscribe - for SIP SUBSCRIBE requests
doNotify - for SIP NOTIFY requests
doMessage - for SIP MESSAGE requests
doInfo - for SIP INFO requests
doProvisionalResponse - for SIP 1xx informational responses
doSuccessResponse - for SIP 2xx responses
doRedirectResponse - for SIP 3xx responses
doErrorResponse - for SIP 4xx, 5xx, and 6xx responses

The default implementation of doAck, doCancel and all the response handling methods are empty. All other request handling methods reject the request with a 500 error response.

Subclasses of SipServlet will usually override one or more of these methods.

The resemblence with HttpServlet is obvious. Both extend javax.servlet.GenericServlet and have similar doOptions() methods. What else is similar? Not much.

SIP Servlets have a fundamentally different purpose in life compared to HTTP Servlets. While the latter are primarily intended to serve HTML pages back to Web browsers, the former are primarily used for registering callers and consequently forwarding call setup requests to the current IP adress of the callee. SIP Servlets can be also implemented on User Agents or Endpoints.

In a casual day-to-day scenario, a Web surfer, friendly Alice for example, uses a search engine to find a web site with the content she is looking for, then hit the URL of the site and get the HTML content from the web server powering the site.

Alice also has a SIP phone that is registered with the SIP server of her VoIP service provider. To call Bob, she uses the phone book or a people search engine on the web to find his phone number (because she always forgets it). Then she punches in the digits and her phone will ask the SIP server about the current IP address of Bob's phone, so that the two can establish a direct voice channel. The SIP server finds the information and returns it to Alice's phone, which then establishes a voice channel over RTP/RSTP (not SIP!) with Bob's phone. At the end of the call Alice hangs up the handset and her phone notifies the SIP server that the line is available to take incoming calls.

This is an oversimplified scenario, that assumes a SIP only world, but fits the scope of this text. Notice that the actual content (voice) is not delivered over SIP as it is in the HTTP case with HTML. It is instead delivered over RTP. SIP and RTP are independant standards. SIP is used for signaling, RTP is used for media. SIP Servlets do not deal with content delivery(media). It is not possible to play a voice message back to a caller directly from the SIP Servlet doResponse() method. Notice the difference with Parlay?

What about VoiceXML? Many developers have heard of it as the cool XML language similar to XHTML Forms that allows users to input data by speaking instead of typing. Maybe SIP Servlets can serve VoiceXML content to SIP phones? It's possible...but it is not how VoiceXML is used typically. In most scenarious VoiceXML is served by Web servers directly to end points or to an intermediary text-to-voice transformation engine. The following research paper from the Columbia University, illustrates well the roles of SIP, RTP, HTTP, and VoiceXML in a conference call system: http://www.nyman-workshop.org/2002/papers/2272.pdf

Here is a snippet of code from a SIP Servlet, which connects a caller to a callee either directly or via alternative SIP Proxy:


protected void doInvite(SipServletRequest req)
throws ServletException, IOException {

if (!req.isInitial()) { super.doInvite(req); return;}

List contacts = resolve(req.getRequestURI());
if (contacts.isEmpty()) contacts = resolve(req.getTo().getURI());

if (!contacts.isEmpty()) {
trace("Found contact info - " + contacts );
Proxy p = req.getProxy();
p.proxyTo(contacts);
return;
}
String next = getContextParam("next_app");
if(next == null) {
// Reject as not found
trace("User not found and no forwarding servlet info available");
SipServletResponse resp = req.createResponse( 404 );
trace(resp);
resp.send();
return;
}

// pass it to next proxy
SipURI nextUri = getPlainURI((SipURI) req.getRequestURI());
nextUri.setParameter("servlet", next);
trace("User not found, route request to <" + nextUri + ">");
Proxy p = req.getProxy();
p.proxyTo(nextUri);
}

Pros and Cons of SIP Servlets

Pros:

Familiar API. Most J2EE developers will be comfortable to try it without going through specialized training.
Scalability: SIP Servlets are designed to be mostly stateless. SipSession is the recommended structure where servlets should store non-persisted state. To ensure transparent failover the session has to be replicated to other cluster nodes. This is a well understood problem and there are practical solutions that can be borrowed from Http Servlet containers.

Cons:

Protocol Dependency: As the name implies, SIP Servlets are strictly tied to SIP and do not address other practical protocols like H323, SS7, and IAX. It is up to the application developer to come up with abstraction layers so that code written for SIP Servlets can be reused for non-SIP clients.
Danger of mixing front end with business logic: Sip Servlets seem vulnerable to some of the problems that HttpServlets exhibit.

Servlet developers tend to mix flow control with business logic, which makes the code harder to maintain and reuse.
Unit testing of servlet code is not simple, because it requires simulation of the communication protocol. Frameworks similar to Jakarta Cactus and HttpUnit will have to be developed to alleviete this inconvenience.

Lack of rigid component model: As VoIP applications mature and grow in size, there will be likely demand for component frameworks that separate the call control from the business logic classes and the persistence layer. Existing J2EE APIs like EJB and JMS will be prime candidates to fill the void. JAIN SLEE is also a contender. Time will tell, which one will be the preferred direction for developers in the long run.

JAIN SLEE

SLEE (Service Level Execution Environment) is viewed by some as the crown jewel of JAIN (Java APIs for Inteligent Networks).

If you glance through the specification you will notice that many concepts sound, look and feel like J2EE. For example ActivityContext reminds us of HttpSession, CMP semantics of SBBs and Profiles are similar to EJB CMP, transaction isolation, JMX and JNDI are also present in SLEE with almost identical characteristics as in J2EE.

That is not accidental. The expert committee took their time to learn from the J2EE lessons and cherry pick concepts, techniques, and best practices from it that best fit the needs of a Next Generation Service Delivery Platform. It took 5 long years from the initial formation of JSR 22 until its public release in late 2004.

Compared to the 100+ years of telecom legacy, 5 years is an impressive achievement for a standard that accomodates input from such a wide variety of industry players with disparate interests. The discussion took a long route starting from the possible adoption of a Java API for SS7, which is a a highly specialized telephony stack covering a wide range of networking layers including a physical layer.

Eventually the experts recognized the need to attract the mainstream application developers and made a significant effort to accomodate relevant J2EE ideas and constructs.

But the story is not all rosy. My personal initial experience with JAIN SLEE was not a positive one. The spec is overwhelming and hard to grasp at first! This is a red flag for its future adoption rate. I have 5 years of hands on experienced with J2EE and am member of the JBoss core team. JBoss is the J2EE server with #1 market share. If I cannot get up and running with SLEE in a few days, what chance does it stand to win mainstream developers mind-share?

Those who have been on the front rows of the Web technology evolution from CGI to PHP3 to JServ and eventually EJB, would probably appreciate a chance to skip a few iterations in the VoIP middleware evolution and go straight to SLEE. 7 years ago, EJB was insanely overwhelming for newcomers. Now, we all love EJB3.

Now let's look at some application code. Assume a new call comes into the SLEE. The call has one call leg that an SBB is interested in. SBBs are Service Building Blocks used to compose higher level intelligent features. They are similar in conept to EJBs.

If the SBB in question decides to immediately disconnect the connection it will do so in the call back from SLEE:


public void onAlertingEvent(JccConnectionEvent event, ActivityContextInterface ac){
JccConnection connection = (JccConnection)ac.getActivity();
connection.release();
}

For the SBB to create a new call, it will use the following sequence:


JccProvider provider = (JccProvider) new InitialContext.lookup("location");
JccCall call = provider.createCall(args);

To receive events on this call, the SBB must subscribe in the following manner:


ActivityContextInteface ac =
JccActivityContextInterfaceFactory.getActivityContextInterface(call);
ac.attach(sbbLocalObject);

You probably notice the resemblence with Parlay. The call protocol is abstracted via the JCC API, which nicely fits in the context of SLEE.

One unique characteristic of SLEE is its asynchronous signaling model. SLEE encourages components to communicate among each other via asynchronous events. SBBs are expected to implement self-contained units of logic, which react to events, promptly perform their assignment and produce another event as ouput. The resulting event can be a message to another SBB or a Resource Adaptor.

SBBs are mostly stateless, but they can have CMP fields which allow them to keep non-persisted state, similar to Stateful Session EJBs. Persisted state is stored in Profile Tables, which can be viewed as simplified relational database tables.

SBBs do not directly interact with objects outside of SLEE. Instead they receive and send events to specialized Resource Adaptors, which in turn are responsible to interface with the world. Examples for RAs include SIP RA, EJB RA, Web Services RA.

Multiple SBBs are assembled into Services (aka Features) such as Find Me and Call Forward.

Here is a SLEE component diagram:

Another important characteristics of SLEE is that it defines event delivery semantics based on SBB priority level. Since most of the signaling is asynchronous and goes through the SLEE event router, the prioritization semantics allows emergency (e.g. 911) calls to quickly go through despite the presence of other ongoing calls no matter how many of them there are.

To learn more about SLEE, you can start with the following white papers and articles:
http://tinyurl.com/8appk

Pros and Cons of JAIN SLEE

Pros:

Protocol Abstraction: SLEE promotes multiple planes of abstraction. However it does not go as far as hiding the asynchronous nature of call signaling.
Scalability: Mostly stateless, well defined structures for replication and persistence. Asynchronous messaging allows the runtime engine to quickly route important calls. A simple event routing algorithm, which lends itself to multi-threading.
Component model: Strong component model reflecting best practices derived from vast experience with telecom systems. Rigid mathematical model for event routing and well defined state machines for the life cycle of each component should make it safe to switch between compliant vendor implementations.
J2EE friendly: SLEE APIs heavily borrow from J2EE. There is also a set of best practices for interoperability between SLEE and J2EE.

Cons:

Steep learning curve: Developers will likely need initial training before they can write proper SLEE applications. Availability of visual tools, self-guided tutorials and best practices will be essential to flatten the learning curve.
Integration testing:Well designed, self-contained SBBs should be easy to unit test. However testing end to end scenarious that involve multiple SBBs are harder to write, because the flow sequence can be time sensitive. The SLEE TCK provides a good base for writing end-to-end tests and it offers good examples, which alleviates the problem to some extend. There seems to be a need for a testing framework which further simplifies matters.

Conclusion

In this text we looked at three viable VoIP middleware platforms - JAIN SLEE, SIP Servlets and Parlay/OSA. We saw some of their key diferences and similarities. Since each of the three open standards has been implemented by multiple vendors and deployed in real-life production systems, they have all proven their qualities.

Adoption however remains limited to relatively small communities as compared to mainstream middleware systems such as J2EE or .NET. Google search returns 8,370 results for Parlay/OSA, 40,300 for JAIN SLEE, and 45,800 for SIP Servlet. Compare that to 8,870,000 for J2EE and40,900,000 for Microsoft .NET. The question still remains, which VoIP platform, if any, will come close to these numbers.

For completeness, it should be noted that there are reports of alternative solutions, which bypass the forementioned VoIP platforms altogether. For example a click-to-dial application can be constructed with a plain HTTP Servlet container and a JAIN SIP library.

So there it is. Now that the cards are on the table. which direction are You most likely to take?

Acknowledgements

I would like to thank Ranga, who is a respected expert in the JAIN community, for taking out of his own time on the weekend to review this text and provide important corrections. I would also like to thank Phelim O'Doherty, Swee Lim and the rest of the people who posted comments to this text since it was initially published.

Ivelin Ivanov - blog

Friday, August 26, 2005

Google and Skype opening up a can of goodness

Tuesday, August 16, 2005

Open Cloud contributes IDE to Mobicents

Monday, August 08, 2005

Sign Up for JBoss World Barcelona in October

Saturday, August 06, 2005

Mobicents performs

Thursday, August 04, 2005

JAIN SLEE, SIP Servlets, and Parlay/OSA (2nd Ed)

Telephony protocols and abstraction layers

Parlay/OSA

Pros and Cons of Parlay

Pros:

Cons:

SIP Servlets

Pros and Cons of SIP Servlets

Pros:

Cons:

JAIN SLEE

Pros and Cons of JAIN SLEE

Pros:

Cons:

Conclusion

Related reading

Acknowledgements

Blog Archive