XML
RHS – SOC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type, like –41 –”High Street 7” –false Data comes without a context
RHS – SOC 3 Data vs. Information When we provide a context for the data, the data ( + the context) becomes information, like: –The age of Per Laursen is 45 years –John Peterson lives at High Street 7 –Is Petra Wilson married? false Data + Context = Information
RHS – SOC 4 Data vs. Information We could also denote the context as ”data about the data” This is often referred to as meta-data Information is thus composed of: –Data –Meta-data
RHS – SOC 5 Data vs. Information This is – more or less – how we structure our communication with each other –The age of Per Laursen is 45 years –John Peterson lives at High Street 7 –Is Petra Wilson married? false Meta-data and data One part is not very useful without the other part…
RHS – SOC 6 Data vs. Information Of course, we are often somewhat ”implicit” when we communicate: –He is 22 years (who…?) –My dog is named Kaya (what kind of dog…?) –John is ill (Who is John, what illness…?) We sometimes assume part of the context implicitly, otherwise it would be very tedious to communicate…
RHS – SOC 7 Transmitting information When computers transmit information, they can also be more or less implicit A method call is a kind of data transmis- sion, which is highly implicit: CalculateFactorial(int n) n is ”the number for which we want to calculate the factorial”
RHS – SOC 8 Transmitting information Suppose a program needs to receive information about some product A product has –A name –A price –A weight How can we transmit this information to the program?
RHS – SOC 9 Transmitting information Perhaps just put the data into a file: ”Milk ” The meaning being: –The name of the product –The price of the product (in kroner) –The weight of the product (in grams) –Each data separated by a ” ”
RHS – SOC 10 Transmitting information The program can then just read the file, and ”decode” the data However, this assumes that sender and receiver of the data have agreed about how to interpret the file content!
RHS – SOC 11 Transmitting information Advantages –A compact format, no space wasted –Fast to process Disadvantages –Static, hard to change –Receiver and sender tied to each other –What about other recipients? –Not humanly readable
RHS – SOC 12 Transmitting information Main problem: Meta-data is ”encoded” in the receiving program Probably better to make meta-data explicit, to overcome disadvantages Use a ”markup language” to include meta- data in the transmission
RHS – SOC 13 Markup languages In a markup language, we can ”mark” data in a way which conveys the context We mark the data with meta-data An example of a markup language is HTML (HyperText Markup Language): This is very good
RHS – SOC 14 Markup languages The markings and are markings (tags) indicating that some meta-data should be applied to the data between the tags – write it in bold In HTML, tags are only used for formatting of data – it contains meta-data about how data should be displayed Enter XML!
RHS – SOC 15 What is XML…? eXtensible Markup Language Learn this!
RHS – SOC 16 XML XML can be seen as a genera- lisation of HTML – tags can be used for everything! All kinds of meta-data can be included as tags in XML Important! XML does not define anything about presentation of data
RHS – SOC 17 XML A product defined in XML: Milk Start the Product description End the Product description
RHS – SOC 18 XML XML is highly recursive Inside a definition, we can have a number of ”child” definitions At some point, the definitions only contains data, like ”Milk” A definition can also have attributes associated with it
RHS – SOC 19 XML Milk
RHS – SOC 20 XML When to use attributes vs a child element Attributes should not be data in itself, it should be information about some data element Not a strict rule… When in doubt, use child elements
RHS – SOC 21 XML Tempting, but not in the spirit of XML… Harder to process by recipient
RHS – SOC 22 XML The general structure of an XML document is then –An XML declaration: –A root element containing the data –Inside the root element; all the child elements
RHS – SOC 23 XML Milk Orange Juice
RHS – SOC 24 Exercises Review: R23.2, R23.6
RHS – SOC 25 Processing XML documents How do we process an XML document, in order to retrieve data from it? We apply an XML parser to the document The XML parser transforms the XML document into a tree structure The tree structure follows the Document Object Model (DOM)
RHS – SOC 26 Processing XML documents products product namepriceweight Milk
RHS – SOC 27 Processing XML documents DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = fac.newDocumentBuilder(); String fileName =...; File xmlFile = new File(fileName); Document doc = builder.parse(xmlFile); // Now doc contains the DOM tree...
RHS – SOC 28 Processing XML documents Given a tree following the DOM standard, we can address various elements in the tree, using the XPath syntax –XPath describes a single node in the tree, or a set of nodes –Syntax similar to directory paths
RHS – SOC 29 Processing XML documents products product namepriceweight Milk /products/product[1]/weight
RHS – SOC 30 Processing XML documents Other XPath constructions: –count(/products/product) – get the number of product instances – get the value of the attribute unit –name(/products/product[1]/*[1] – get the name of the first child of the first product
RHS – SOC 31 Processing XML documents XPathFactory xpfac = XPathFactory.newInstance(); XPath path = xpfac.newXPath();... String result = path.evaluate(”/products/product[1]/price”,doc); // Now result contains the price of the first product...
RHS – SOC 32 Processing XML documents In general, we will convert an XML document into a number of Java objects We map XML data to Java classes Up to us to define proper classes to store the data – XML does not know about classes Each element in an XML document is like an instance field, not a class…
RHS – SOC 33 Creating XML documents In addition to processing given XML documents, we often wish to program- matically produce XML documents For this purpose, we again use the Document- Builder classes
RHS – SOC 34 Creating XML documents DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = fac.newDocumentBuilder(); Document doc = builder.newDocument; // Now doc contains an empty DOM tree...
RHS – SOC 35 Creating XML documents We must now insert node elements into the tree, corresponding to the structure of the data Fundamental methods are: –createElement(String name); –setAttribute(String name,String value); –createTextNode(String text); –appendChild(Element e);
RHS – SOC 36 Creating XML documents createElement(String name); Creates an empty new element, with the given name Is called on the document object On a new element, we will –Set value of attributes –Add child elements, or –Add text nodes
RHS – SOC 37 Creating XML documents appendChild(Element e); Is itself called on an element Appends the element e as a child on itself This is how we create the structure for the tree!
RHS – SOC 38 Creating XML documents The previous methods are enough to create a DOM tree Usually, we combine the methods into ”helper methods”, designed to insert a certain type of element Helper methods will often call other helper methods, depending on tree structure
RHS – SOC 39 Creating XML documents private Element createTextElement(String name, String text) { Text t = doc.createTextNode(text); Element e = doc.createElement(name); e.appendChild(text); return e; }
RHS – SOC 40 Creating XML documents private Element createProduct(Product p) { Element e = doc.createElement(”product”); e.appendChild(createTextElement(”name”, p.getName())); e.appendChild(createTextElement(”price”, p.getPrice())); e.appendChild(createTextElement(”weight”, p.getWeight())); return e; }
RHS – SOC 41 Creating XML documents private Element createProducts(ArrayList pList) { Element e = doc.createElement(”products”); for (product p : pList) { e.appendChild(createProduct(p)); } return e; }
RHS – SOC 42 Creating XML documents DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = fac.newDocumentBuilder(); Document doc = builder.newDocument; // Now doc contains an empty DOM tree ArrayList pList =...; Element root = createProducts(pList); doc.appendChild(root);
RHS – SOC 43 Validating XML documents It will often be convenient to know if an XML document obeys certain rules about its content Can e.g make processing easier – do not need to include error handling Specification of such rules can be done in various ways
RHS – SOC 44 Validating XML documents Original way – use a DTD DTD – Document Type Definition A DTD is a sequence of rules describing –The valid attributes for each element type –The valid child elements for each element type
RHS – SOC 45 Validating XML documents Examples of DTD rules: – - a products element must contain zero or more elements of type product – - a product element must have the children: one name, one price, one weight, in that order – - a name element must have a child of type text
RHS – SOC 46 Validating XML documents In order to validate an XML document against a DTD, the DTD must be specified –Can be included in the XML document –Can be referenced NOTE: Validation is optional, it is up to us to do it…
RHS – SOC 47 Validating XML documents A more modern way of validating XML documents is by using an XSD XSD – XML Schema Definition Provides a more general framework for specification of the document format Is itself written in XML Comes closer to actual class definitions
RHS – SOC 48 Validating XML documents
RHS – SOC 49 Presenting XML documents IMPORTANT: XML is not for specifying how to present data… …but we can often define a suitable transformation of XML data, into some format we can present This also illustrates the power of XML
RHS – SOC 50 Presenting XML documents We could specify e.g some text as part of a product specification In XML, it could look like: … you must always keep this product cold… This is valid XML, but it does not specify how this text is presented
RHS – SOC 51 Presenting XML documents In one context, we might want … you must ALWAYS keep this product cold… In another context, we might want … you must always keep this product cold… We should not decide this in the XML
RHS – SOC 52 Presenting XML documents This transformation can be specified by a so-called XSLT XSLT: XSL Transformation Specifies a transformation from the XML document into….anything! –A Word document –A HTML page –…?
RHS – SOC 53 Example: Facebook data Data shown on Facebook reside in a (very large) database When someone wishes to see data for an individual, data is extracted from the database, and transformed into something we can see in a Web browser Such a transformation could be specified using an XSLT
RHS – SOC 54 Example: Facebook data Database with Facebook data Data as XML Data as HTML query Transformation A (XSLT) Presentation of Facebook data as we know it Database contains a lot of information about each individual
RHS – SOC 55 Example: Facebook data This is as we are used to; data about the individual is presented in our Web browser Even when shown in a normal browser, there is so much information that it has been divided into several pages What if we want to use Facebook on our mobile phone, which has a much smaller screen?
RHS – SOC 56 Example: Facebook data Database with Facebook data Data as XML Data as HTML Query Transformation B (XSLT) The HTML code generated by the new transformation is adapted to a mobile phone display Only the transfor- mation has been changed!
RHS – SOC 57 Example: Facebook data Database with Facebook-data Query Data as XML Transformation to Internet Explorer Transformation to Mobile phone Transformation to Word document Data as HTML (Web browser) Data as Word document Data as HTML (mobile phone)
RHS – SOC 58 Exercises Review: R23.11 Programming P23.1, P23.4