Friday, May 12, 2017

What is Thrift

Thrift:
Point 1:

An RPC framework in general is a set of tools that enable the programmer to call a piece of code in a remote process, be it on a different machine or just another process on the same machine.

In the particular case of Apache Thrift, we talk about a framework designed to be efficient, and available across both OS platforms and programming languages. Additionally, you have some flexibility regarding transports (such as sockets, pipes, etc) and protocols (binary, JSON, even compressed), plus some more options like SSL or SASL support.

For example, you may set up a server on a Linux machine, written in C++ which offers some service to the world through a JSON-based protocol over HTTP. This service may be called by a client program written in Python, running on a Windows machine. The code for both server and client is generated from a Thrift IDL file. To get it running, you basically have to add only the intended program logic and put all the pieces together.

The single best reference for Apache Thrift is still the Apache Thrift Whitepaper. Although slightly outdated in some of the details, the underling concepts are still valid. Another good read is Diwaker Gupta's "Missing Guide", and last not least the forthcoming book from Randy Abernethy.

For beginners, I would recommend to start with the Apache Thrift tutorial suite, these examples show a lot of the core features. If you run into questions, you are welcome to ask here on SO, or on the Thrift mailing lists.

Point 2:
RPC (Remote Procedure Call) is like calling a function, only that it is present remotely on a different server as a service. A service exposes many such functions/procedure to its client. And client requires some way to know what are the functions/procedures exposed by this service and what are their parameters.

This is where Apache Thrift comes in. It has its own "Interface Definition Language" (IDL). In this language you define what are the functions and what are their parameters. And then use Thrift compiler to generate corresponding code for any language of your choice. What this means, is that you can implement a function in java, host it on a server and then remotely call it from python.

Important work a framework like Thrift does is this -
  1. Provide a language agnostic Interface Definition Language
  2. A Compiler to compile this IDL to produce client and server code (in same or separate language as required)
  3. Compiler generated client code exposes a stub interfaces for these functions. The stub code converts the parameters passed to the function into a binary (serialized) format that can be transported on wire over network. This process is called marshaling. The generated client code never has the actual implementation of the function, hence its called a stub.
  4. At server, the developer use the Compiler generated server code, to actually implement these functions (i.e. write the actual functionality of the function). Generated server side code receives the binary encoded message from client, converts them back to the corresponding language objects and passes it to the developer implemented function. This is called as unmarshaling. In java for example the Compiler generated server code would be Interface that the developer will implement and also various other classes.
  5. Similarly the result of a function is converted to binary and send to client.

For parameters to the function, IDL defines its own set of data structure types like List, Map, Struct or Classes apart from native types like Int, String, Boolean, etc. These are then mapped to corresponding language implementations.

Thrift is similar to SOAP and CORBA. Since they both are used for RPC and provide their own IDL. CORBA and SOAP generally also has a service discovery broker as a middleware for exposing functions/methods to client. For thrift, we normally use Zookeeper for service discovery.

REST is different, because it does not have IDL, And uses HTTP Methods like GET, PUT and url patterns to call a remote function and pass parameters. Using HTTP methods and url semantics makes it also language agnostic.

Messaging queue is entirely different. Because it is mostly used in Publish/Subscribe model. Whereas RPC is Client/Server model.

In Publish/Subscribe, multiple publishers sends/add a serialized message on a queue. The message format is defined by the publisher and has complete control of it. Their definition is semantically associated to the queue on which they are published, but there is no strict checking for their structure. Subscriber, then knowing the kind of the message a queue will have, subscribes for those messages. Publishers dont know who are the client, and Subscribers dont know who are the producers of the message. They only know what kind of message to publish or consume respectively from a queue. The Publisher and Subscriber is responsible for knowing the right serializer and deserializer.
This is different in Client/Server RPC, since Client knows (in strict sense) what to pass and Server defines it. And also whom to pass.

Other library similar to Thrift is Protobuf and Avro.
Finagle by Twitter is one way to create Thrift based service, it also have support for Protobuf.

point 3:
There's a lot of repeated work you have to do when you're writing a server - primarily designing a protocol and writing code to serialize and deserialize messages on the protocol, but also dealing with sockets and managing concurrency, and writing clients in many languages. Thrift automatically does all of this, given a description of the functions you want to expose from your server to clients. It's also useful for serializing data on disk or into shared memory (where many of the same problems come up).

Application :
Juniper JET is using thrift
https://www.juniper.net/documentation/en_US/jet1.0/topics/concept/jet-overview.html


Reference:
http://stackoverflow.com/questions/20653240/what-is-rpc-framework-and-apache-thrift
http://thrift.apache.org/static/files/thrift-20070401.pdf
http://diwakergupta.github.io/thrift-missing-guide/thrift.pdf
https://www.quora.com/In-simple-terms-what-is-Thrift-software-framework-and-what-does-it-do

Demo:
https://www.youtube.com/watch?v=NK6hz2JM89w

No comments:

Post a Comment