Understanding XML
XML, which stands for “ eXtensible Markup Language ” is a new information format which allows data portability – the ability for different computer systems to exchange data without the need for specially written file translation programmes or “interfaces.” The significance of the name XML is:
“Extensible ” because it is not limited to one use / application
“Markup ” because it “marks-up” or identifies information – particularly fields or elements which comprise a message. The way in which this is done is called “tagging”.
“Language” because it is a set of rules or a “protocol” .
The US General Accounting Office has described XML as “a flexible, non-proprietary set of standards for annotating or “tagging” information so that it can be transmitted over a network such as the internet and readily interpreted by disparate computer systems.”
Technically, XML, (often referred to as “the next generation HTML”) may be described as a “platform-independent , self-describing , expandable, standard data exchange format that can be either used independently or embedded and used within other solutions.” What does this mean?
“Platform independent ” because it enables different systems to talk to one another.
“Self-describing ” because, within each piece of data or information (each “field” or “element”) within a message that is being communicated is a description of what that data is. For example, “date” or “invoice number.”
“Expandable” because one can add to the language as new requirements emerge.
“Standard data exchange format” – like TWIST, the standard is an open, free and web-compatible protocol, allowing computers to share information.
“Used independently or embedded” – since it separates the data from the applications, it can be reused for multiple purposes.
“Used with other solutions” – it builds on existing technology investments.
How important is XML?
Three quotes (among very many):
Oracle: “ We are moving to a new age of data portability, one in which XML will play a key role. ”
Boeing : “We have defined an XML-based architecture to enable interoperability and electronic data interchange for internal use and external communication…”
BancBoston: “XML will revolutionize the way data is stored, processed and retrieved”
How do you use XML?
XML is only a “ meta-language ” – dialects of this need to be developed as standards, to cater for different types of messages. TWIST’s objective is to create global standards for three key, inter-related areas of the financial sector:
- Wholesale financial markets transaction processing
- Commercial payments & collections and working capital management
- Cash management
Why XML?
Data Integrity
When stored as XML files, data can be easily shared and transferred across various systems, whether they are systems internal to an organisation, or external systems ranging between different organisations. Whilst the same can also be said of proprietary data formats, flat files or comma-separated-variable (CSV) files, etc, XML brings with it an additional dimension designed to increase data-integrity.
Consider the case of Company-A supplying Company-B with product data in CSV format. Should Company-A inadvertently ‘slip’ some bad data into the CSV file, the chances are that the CSV file will be imported into the system of Company-B and processed either with no knowledge of the bad data, or worse still, the bad data will cause the Company-B system to crash. XML on the other-hand is self-checking in that an XML file ‘knows’ how data should be presented to it, and will complain should any system attempt to provide it with bad data. This is possible because of the XML Schema Document (XSD), which goes hand-in-hand with an XML file and describes the exact make-up of the data within that XML file.
Data Hierarchy
When Company-A wants to send a CSV data file to Company-B detailing hierarchical data, such a list of invoices and their ‘child’ line items, this can often lead to over complicated data-file designs, such as multiple files representing the same data, etc.
XML overcomes this unnecessary over complication by providing design mechanisms with which hierarchical data can be defined within a single file. XML “attributes” allow the data designer to define nested “consists of” relationships, whilst “nested elements” accommodate ancillary or “meta” information.
XML is object-oriented in the sense of being suitable for describing objects of the real world or any abstract problem domain by modelling their properties as they are, instead of enforcing a normalized decomposition into various tables linked by relations. This makes XML documents more intuitively understandable and thereby reduces both the time required to design and implement systems based on XML.
Humanly Readable
Whilst it may not appear to be a major advantage, the ability for a human to read XML data can greatly improve general support and even development efforts. As an example, try to find the error in this data:
GBPNABB20031204335423
And now this data:
<Currency> GBP </Currency>
<Bank> NAB</Bank>
<Year> 2003 </Year>
<Month> 12 </Month>
<Day> 04 </Day>
<Hour> 33 </Hour>
<Minute> 54 </Minute>
<Second> 23 </Second>
XML is being widely adopted by the computer industry
One key factor in the success of the Internet was the wide adoption of the TCP/IP protocol suite by many corporations. This resulted in huge sales volumes and consequently ever decreasing prices for all network components used.
XML is widely accepted and implemented by many vendors; this fact will result in higher volumes and lower prices for software components. This is why XML’s predecessor, SGML, was never successful on a broad scale. SGML products were typically priced in the ten-thousand dollar range, whereas XML products are today priced in the hundreds.
XML is Global: To better understand the attention that XML has received, it is useful to recall another widely-adopted data standard that everybody takes for granted today: ASCII, the American Standard Code for Information Interchange.
While ASCII was restricted to a certain alphabet and writing system, it was still crucial in allowing different computer types and operating systems to freely exchange data. With the adoption of Unicode 1.0 and its continuing evolution, the idea of ASCII was expanded to encompass all languages and writing systems of the world.
Today, it is taken for granted that computers are capable of reading and processing text documents based on ASCII or Unicode. XML takes this approach one step further, by building on Unicode and defining a universal way to describe structured data for all different purposes.
All XML documents are per definition Unicode-based, but may be stored on disk or transmitted over the network in various different “encodings”, such as ISO-8859-1 or UTF-8. This is why some people today call XML the “ASCII of the future”.