Introduction to XML workshop
5 Oct 2005
TTU Advanced Technology Learning Center
Taught by Bosah Chukwuogo
All markup languages have similar characteristics
XML is eXtensible Markup Language
- extensible: capable of being extended
- markup language: a system for marking or tagging data to indicate its logical structure
- it is just a markup language
XML marks data to indicate its logical structure
Handouts, PowerPoint Slides
Student1, Student2, Student3, etc
HTML is built like XML
1. XML was born out of the need to to describe data in clear text so that it can be interpreted across different languages and environments, and easily transported
2. Since this data must work in different incompatible platforms, it must independently describe itself
3. SGML was used by book publishers to markup books, that is used in theses, without SGML or another formatting language you can’t create those massive encyclopedias
4. XML is now a universal standard
Example 1 showing logical structure for describing a list of authors
< ?xml version="1.0" ?>
As long as a file has the XML declaration at the front, it is an XML file (regardless of extension)
- declaration can only appear once in a file, and it must be closed
Elements in an XML file are just entities: markup entities
Everything has to be properly tagged: opening and closing tags, to be compliant with XML standards
- this is not negotiable
Child and parent relationships apply to the entities that you are creating
- this is like indenting in unordered and ordered lists in a word processor
In this example:
This is opening tag – element value – closing tag
XML elements can also have attributes, provides additional information about the elements. Example:
< ?xml version=”1.0”?>
These attributes play a very important role when you are describing complex relationships
Before you just had websites that had information, people had to come to your website and see what you had in there
- idea behind RSS, each time your website changes, you publish an XML site with information about what has changed on your website
- so I don’t have to always go to your website to check it for changes
- you are pushing out the content to users
- XML is the driving force making the blogging revolution possible
- XML is a way to structure data in certain predefined formats
XML uses tags to contain and structure data
XML is platform independent, language independent, can be sent via FTP, HTTP, etc.
Attributes have to have values, this is incorrect:
This is correct
Certain characters cannot be used anywhere
- like an expression: 4 > 5
- XML cannot recognize that because the character is special
- You need to encode it as special characters
XML has a tree structure, inevitably it results in a tree structure (parent tags, child tags, etc)
Child tags indicate ownership of attributes / characteristics
In a diagram of an XML file’s tree structure, each circule in the diagram is called a node, a node-set is a node and its descendants (tree-branch)
- some people refer to nodes at the same level are siblings
- some people call child nodes descendants
XPath is a way of identifying exactly where you are in the XML tree
- everything in an XML file has a unique location
- - allows you to pinpoint the exact location of data
- Like breadcrumbs in a hierarchically organized website
- Author/publisher/address/street
XPath is a query language like SQL that lets you run queries to XML files
- XPath differentiates between author/name and author/publisher/name
- XPath is a specification or a standards
- There are several XPath software tools that let you query XML files, both open source and closed source
XPath will let you specify complex queries and find anything in an XML file
- XML files can get very big
- Even RSS feeds get huge, it is too hard to visually inspect them
- Need some kind of tool to help navigate and work with XML files
Name spaces: An XML file can have a name space
- allows you stay in your space of names, what your definitions are
- don’t worry about these, they don’t come up as much in real life
Have element nodes, attribute nodes, text nodes, comment nodes, and namespace nodes
Another way to close tags is with a forward slash, the XML reader looks backward
Alternative format:
Issue of interoperability is huge with XML
- you have to define what your file means
- attributes must always have open and closed quotes
- browsers tend to forgive these kinds of things, but XML does not
- make sure your XML as well as HTML is always well formed
Tag names cannot contain spaces
- certain characters are not allowed: ampersand (&) must be replaced with &
XML is case sensitive
- to the computer
Example you see of needed ampersand code is ©
There are multiple ways to represent information contained in a table in an XML file
- bottom line is if the people consuming your information understand it
My question: Can Excel export as XML?
What does this all mean?
DTD: Document Type Definition
- if I consume your XML data, I have to know what it means so we can exchange data
- this was an early attempt allowing parties to agree on a format for XML exchange
- this is really out of date now
- you could validate XML documents
There is a difference between validation and having good syntax
- if you get an XML file and it means what you want it to mean, then it is valid
- terms are used as a synonym at places like the W3C validator
DTD test Validating XML files verifies
- proper nesting has occurred
- all required tagas are present and accounted for
- specific units of information are of the correct type and fall within the specified legal values
XML that passes the DTD test is valid with respect to that DTD test
Shortfalls of DTDS
- can just work with dates up to 2000, there were certain dates you couldn’t represent
- time span representations were difficult / impossible
XML schemas are a more advanced version of DTDs with several advantages
- provides support for namespaces, to help resolve conflicts in tag names
- richer datayptes than DTDs
- user defined types called archetypes
- allowance for attribute grouping, many attributes often go together
Schema is just a way to validate an XML file
XML is a cousin to HTML and can be formatted into XMLT
XSLT style sheet controls the presentation
Also should put encoding attribute in an XML file so a browser knows what to do with your XML (what language it is in)
Also use stylesheet specification for formatting
XSLT helps you control formatting of content
XML + XSLT Stylesheet = HTML or XML or WML (wireless markup language for cell phones)
HTML output is dynamic based on what data is contained in the XML string
XSLT stylesheets are XML documents and conform to all the properties of XML
On this day..
- Oklahoma City to Doha (narrated slideshow with SonicPics) - 2011
- K-12 Online Conference Presentation Teasers! #k12online10 - 2010
- FTC mandates disclosure for bloggers receiving freebies/payments - 2009
- Dell Mini10V Netbook, Ubuntu, Win7, and Hackintoshes - 2009
- XO Laptop for Christmas? - 2007
- Exponential blog growth - 2005
- Great history resource suggestions from Tom Hale - 2005
- FeedBlitz email list subscriptions available! - 2005
- 1:1 laptop initiatives move forward - 2005



























