You are here: silicon.com > Hardware > Storage

Storage

Google open sources data-moving tool

Protocol Buffers 'simple' alternative to XML

Tags: open source, google

By Matthew Broersma

Published: 11 July 2008 08:27 BST

Google has open sourced an internal development tool called 'Protocol Buffers', a data description language that forms a basic part of the operation of the company's vast computing cluster.

The tool, which has been in use for several years at Google, handles the process in which the company encodes almost any sort of structured information that needs to be passed across the network or stored on a disk, Google open-source programs manager Chris DiBona said in a blog post announcing the move.

Protocol Buffers could be useful for other organisations that need an efficient way to move structured data around a network, for instance in large clusters or data centres, DiBona said.

The best of Google Earth

From Hollywood to Vegas and racetracks to controversial domes... click here to travel the world with Google Earth.

Google uses thousands of data formats for networked messages, and XML is simply too cumbersome to use as an encoding method for it all, Google software engineer Kenton Varda explained in a separate blog post. "As nice as XML is, it isn't going to be efficient enough for this scale," he wrote. "When all of your machines and network links are running at capacity, XML is an extremely expensive proposition."

Various other methods exist for passing encoded data over networks but Google found none of them suited its particular need - which was for a system optimised for efficiency over everything else, Varda said. Protocol Buffers is a sort of interface definition language (IDL) but IDLs have a reputation for being over-complicated, he said.

He said: "One of Protocol Buffers' major design goals is simplicity. By sticking to a simple lists-and-records model that solves the majority of problems, and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated."

He estimated the system is at least an order of magnitude faster than XML, while other Google documentation said Protocol Buffers can be parsed 20 to 100 times faster. The binary files produced by Protocol Buffers are three to 10 times smaller than a comparable XML file, Google said. Google released an FAQ detailing Protocol Buffers, along with source code for the Java, Python, and C++ protocol buffer compilers.

Google admitted that the system is comparable to long-established projects such as JavaScript Object Notation (JSON), which is often used in Ajax web programming. But JSON, like XML, is a human-readable text format, rather than a binary format such as Protocol Buffers, a fact that reduces JSON's efficiency, Google said.

Even so, Google was criticised on some fronts for creating its own system from scratch and ignoring currently existing approaches. David Golightly, user experience developer lead for Zillow.com, argued the textual syntax used in Protocol Buffers could have been made interoperable with an existing text-based format.

Golightly said in a blog post: "I'm always just a little disappointed when someone goes about creating their own new textual format syntax on arbitrary grounds, rather than adapting an existing format to their needs."

Google is not the first to open source its internal data interchange system: Protocol Buffers is very similar to the Thrift framework, developed by Facebook and now an open-source project in the Apache Software Foundation Incubator. Thrift, however, differs in that it describes services rather than pure data.

Original article: Google open sources 'Protocol Buffers' from ZDNet UK

  1. Zones
  2. Management
  3. Networks
  4. Software
  5. IT Services
  6. Hardware
  1. Verticals
  2. Public Sector
  3. Financial Services
  4. Retail & Leisure

  • Jobs
Interface Developer

Liaise with 3rd Party system suppliers to ensure the CloverLeaf product is functioning correctly and in order to develop interfaces to other systems ...

Structured Products Business Analyst London

Structured Products Business Analyst London. The candidate will have an excellent knowledge of credit derivatives and structured products with ...

Front Office Java Developer: Structured Derivatives

I have an exciting opportunity for a Java Developer to join an existing team working in the structured derivatives IT team responsible for a new ...

CIO50 2008
The silicon.com CIO50 2008 profiles the most influential and innovative tech chiefs in the UK across all industries and organisation size, from the biggest FTSE100 companies to high growth dot-com start ups and the public sector. The list was voted on by the UK CIO community and a panel of experts. Find out more in our latest special report.





Quick Sitemap Links: