Apache Ant/Print version
This is the print version of Apache Ant You won't see this message or any elements not part of the book's content when you print or preview this page. |
The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Apache_Ant
Background
What is Apache Ant?
[edit | edit source]- An operating system and language neutral XML based "Build Tool"
- A scripting language for executing any complex file-system processes that can be run from a command line interface
- A.N.T. – Another Neat Tool
- Used for Building the Project
History of Ant
[edit | edit source]- Built by James Duncan Davidson
- Frustrated with the UNIX "make"
- Invented while turning a product from Sun into open source
- Make used "tab" as a record separator
- Tabs frequently got converted to spaces during copy/paste operations
Why is Ant Strategic?
[edit | edit source]Ant is important because it helps organizations create repeatable build processes.
Repeatability is critical to organizations reaching the next level of CMU's Capability Maturity Model :
- Initial
- Repeatable
- Defined
- Managed
- Optimized
Ant helps you get from the Initial to the Repeatable level.
Ant is a Process Discipline
[edit | edit source]- Process discipline helps ensure that existing practices are retained during times of stress
- When these practices are in place, projects are performed and managed according to their documented plans
- Answers the Question: How did the prior developer compile, test and install their system?
- Excellent aid for software archeologists
Software Project Lifecycle
[edit | edit source]- Version 1 and version 2 of a software package are frequently done by different groups
- Sometimes version 1 and 2 are done years apart by different teams in different countries
- Contractors and internal staff need to use the same tools
- Shared development processes, like those used in the Open Source community, would be almost impossible without tools like Ant
Ant is Operating System and Language Neutral
[edit | edit source]- Builds run on both Windows and UNIX/Linux systems
- Should run anywhere a Java VM runs
- Ant files "know" about file separators "/" vs. "\"
- Build targets can still do OS-specific tasks
- Works with anything that can be done from the command line
- Easy to extend (with Java)
- It is possible to extend with other languages as long as they have Java bindings but in practice, most people use Java to extend Ant
Ant and XML
[edit | edit source]- If you are familiar with XML (or even HTML) you will probably learn Ant quickly
- If you are not already familiar with XML you will need to learn some XML before you use Ant
- One of the best ways of doing this is to read many small Ant sample tasks
- This book should help you do this
Next Chapter
[edit | edit source]
Adoption
Organizational Ant Adoption
[edit | edit source]We have found that most people new to Ant go through several stages of learning Ant.
The three stages of learning core Ant functionality
[edit | edit source]- Learn the grammar and syntax of XML and a build file- this goes quickly if you know any XML or HTML (about 15 minutes)
- Learn the Ant concepts of properties, dependencies and task (about two hours)
- Build your vocabulary: Learn the basic Ant tasks you need to get your job done (duration depends on what you are doing)
Transform Your Organization!
[edit | edit source]- Integrating it into your development/QA process
- Making it a requirement of all projects
- Specifying that all vendor deliverables MUST include a reproducible build process
- Problematic for Microsoft developers that are not familiar with Ant
- Problematic for "visual only" development environments (Microsoft Visual Studio, Microsoft Analysis Services)
Next Chapter
[edit | edit source]
XML Summary
Brief Overview of XML
[edit | edit source]This chapter has a summary of the things you need to know about XML to use Apache Ant. It is not a lot of information and users that are already familiar with HTML will pick up the concepts very quickly.
You do not need to know a lot about XML to use Ant. Just a few syntaxical items.
First, you must be very careful to match up the "begin" and "end" tags. Begin tags start with a "<" and end tags have the same name but start with a "</". Here is an simple example:
<MyDataElement> Data Inside... </MyDataElement>
Train your eye to look for the </ that ends a tag. If it does not match up something is wrong. Tags that do not match up will cause errors.
Data Element Nesting
[edit | edit source]All XML Data Elements must be paired using matching start and end tags:
<Parent_XML_Element> <Child_XML_Element> <Sub_Child_XML_Element> </Sub_Child_XML_Element> </Child_XML_Element> </Parent_XML_Element >
Understanding this paired nesting structure is critical to creating working Ant build files.
XML Attributes
[edit | edit source]The XML begin tag may also have attributes.
<MyTag attribute1="nothing" attribute2="nothing else">Text</MyTag>
Next Section
[edit | edit source]
Getting Started
This section contains three chapters:
- Installation How to download, install Apache Ant
- Testing Testing Apache Ant
- Hello World How to run a small ant program that prints "Hello World!"
Getting Started/Installation
Before starting, you will need to have a running version of the Java Development Kit (JDK) 1.2 or later. For Ant 1.7, JDK 1.5 is recommended. Ant will need a JAXP-compliant XML parser installed and available in your classpath. The binary distribution of Ant includes the latest version of the Apache Xerces2 XML parser.
To install Ant on Windows, you can use WinAnt the Windows installer for Apache Ant. Download and run the latest version of WinAnt, and follow the directions in the installer. WinAnt will place the "ant" executable on your system path, which allows you to run the command "ant" from the command line at any directory in your system. Or, you can follow these directions:
The first step is to download Apache Ant.
There are two options. You can compile the program from the source or for beginners we recommend just downloading a binary file called a zip file.
You can find the download page here: http://ant.apache.org/bindownload.cgi
You will need to download a file such as
apache-ant-1.7.0-bin.zip
This is a compressed archive file. You will need to uncompress it.
If you are not familiar with the process of uncompressing a file you should consult your computer operating system manual.
After this is done, you will see a folder such as:
apache-ant-1.7.0
This folder contains another folder called "bin". Within that there is a file called ant.bat that you can run directly from the command line.
Getting Started/Testing
To test that apache ant is installed correctly type "ant -version" from the command line:
C:\ ant -version Apache Ant version 1.6.5 compiled on June 2, 2005 C:\
If you do not see this, you have to check your PATH variable to make sure that ant is in your path. You can do this by opening a prompt and typing the "set" command.
To add the path in Windows: Right click on "My Computer" and find the Environment Variables button. Find the System Variable "Path", and add the path for Ant's bin folder (C:\ant\bin or whatever it is) separating it from other paths with a semicolon.
Getting Started/Hello World
Hello World in Ant
[edit | edit source]Create a directory "My Project". Use a text editor such as Kate, Gedit or Notepad to create a file called build.xml within directory "My Project":
<?xml version="1.0"?>
<project name="My Project" default="hello">
<target name="hello">
<echo>Hello World!</echo>
</target>
</project>
The first line in the file is flush left (no indentation). It tells ant that this is an XML file:
<?xml version="1.0"?>
The next line names the (required) project "My Project" and its default target "hello":
<project name="My Project" default="hello">
The central three lines name and define the only target ("hello") and task ("echo") in the file:
<target name="hello"> <echo>Hello World!</echo> </target>
You can now open a shell and cd to the "My Project" directory you created and type "ant"
Output of Hello World
[edit | edit source]Buildfile: build.xml hello: [echo] Hello World! Build Successful Total time 0 seconds
Variations
[edit | edit source]Try changing the echo line to be the following:
<echo message="Hello There!"></echo>
What is the result? Try the following also:
<echo message="Hello There!"/>
[[../../Core Concepts|Next Section]]
Core Concepts
There are several things you must learn to use Apache Ant successfully:
- Basic terminology Apache Ant/Core Concepts/Terminology
- The structure of a build file Apache Ant/Build File Structure
- Using Properties Apache Ant/Property
- Setting up Dependencies Apache Ant/Depends
- Using Fileset Apache Ant/Fileset
Core Concepts/Terminology
Ant Terminology
[edit | edit source]- [[../../Task|Ant Task]] – something that ant can execute such as a compile, copy or replace. Most tasks have very convenient default values. See the Ant manual for a complete list of tasks.
- Ant Target – a fixed series of ant tasks in a specified order that can depend on other named targets. Targets can depend only on other targets, not on projects or tasks. A target represents a particular item to be created, it can be a single item like a jar, or a group of items, like classes.
- Ant Project – a collection of named targets that can run in any order depending on the time stamps of the files in the file system. Each build file contains one project.
[[../../Build File Structure/|Next Chapter]]
Build File Structure
Here is the structure of a typical build.xml file:
<?xml version="1.0"?> <project name="MyFirstAntProject" default="MyTarget"> <target name="init"> <echo>Running target init</echo> </target> <target name="MyTarget" depends="init"> <echo>Running target MyTarget</echo> </target> </project>
Here are a few things to note:
- The Begin and End tags for project (<project> and </project>) MUST start and end the file.
- The Begin <project> MUST have an attribute called default which is the name of one of the targets.
- Each build file must have at least one target.
- The Begin and End tags for <target> and </target> must also match EXACTLY.
- Each target MUST have a name.
- Targets depend only on other targets and reference them by their target name. Targets NEVER depend on projects or tasks.
- Target depends are optional.
- Anything between <echo> and </echo> tags is outputted to the console if the surrounding target is called.
- Every task has to be in a target.
You can execute this from a DOS or UNIX command prompt by creating a file called build.xml and typing:
ant
Ant will search for the build file in the current directory and run the build.xml file.
Here is a sample output of this build:
Buildfile: C:\AntClass\Lab01\build.xml init: [echo] Running target init MyTarget: [echo] Running target MyTarget BUILD SUCCESSFUL Total time: 188 milliseconds
Optionally you can also pass ant the name of the target to run as a command line argument
ant init
Which triggers only the init target
Buildfile: C:\AntClass\Lab01\build.xml init: [echo] Running target init BUILD SUCCESSFUL Total time: 188 milliseconds
Property
Ant does not have variables like in most standard programming languages. Ant has a structure called properties. Understanding how properties work is critical to understanding how (and why) Ant works so well.
Here is a simple demonstration of how to set and use properties
<project name="My Project" default="MyTarget">
<!-- set global properties -->
<property name="SrcDir" value="src"/>
<property name="BuildDir" value="build"/>
<target name="MyTarget">
<echo message = "Source directory is = ${SrcDir}"/>
<echo message = "Build directory is ${BuildDir}"/>
</target>
</project>
Note that to use a property you have to put a dollar sign and left curly brace before it and a right curly brace after it. Don't get these confused with parens.
When you run this you should get the following:
Buildfile: C:\AntClass\PropertiesLab\build.xml MyTarget: [echo] Source directory is = src [echo] Build directory is build BUILD SUCCESSFUL Total time: 204 milliseconds
Ant properties are immutable meaning that once they are set they cannot be changed within a build process! This may seem somewhat odd at first, but it is one of the core reasons that once targets are written they tend to run consistently without side effects. This is because targets only run if they have to and you cannot predict the order a target will run.
Properties do not have to be used only inside a target. They can be set anywhere in a build file (or an external property file) and referenced anywhere in a build file after they are set.
Here is a small Ant project that demonstrates the immutability of a property:
<project name="My Project" default="MyTarget">
<target name="MyTarget">
<property name="MyProperty" value="One"/>
<!-- check to see that the property gets set -->
<echo>MyProperty = ${MyProperty}</echo>
<!-- now try to change it to a new value -->
<property name="MyProperty" value="Two"/>
<echo>MyProperty = ${MyProperty}</echo>
</target>
</project>
When you run this, you should get the following output:
Buildfile: C:\AntClass\PropertiesLab\build.xml MyTarget: [echo] MyProperty = One [echo] MyProperty = One BUILD SUCCESSFUL Total time: 343 milliseconds
Note that despite trying to change MyProperty to be "Two", the value of MyProperty does not change. Ant will not warn you of this.
For newcomers this might seem strange, but this is ideally suited for building up complex trees of values that are set once and used over and over again. It makes your build scripts easy to maintain and reliable.
Ant also has a nice set of "built in" properties that you can use:
This demonstrates how to read system properties
<project name="MyProject" default="Display-Builtins">
<target name="Display-Builtins" description="Display Builtin Properties">
<!-- the absolute path to the location of the buildfile -->
<echo>${basedir}</echo>
<!-- the absolute path of the buildfile -->
<echo>${ant.file}</echo>
<!-- ant.version - the version of Ant that you are running -->
<echo>${ant.version}</echo>
<!-- ant.project.name - the name of the project that is currently executing; it is set in the name attribute of <project>. -->
<echo>${ant.project.name}</echo>
<!-- ant.java.version - the JVM version Ant detected; currently it can hold the values "1.1", "1.2", "1.3", "1.4" and "1.5". -->
<echo>${ant.java.version}</echo>
</target>
</project>
When you run this program you should get an output similar to the following:
Buildfile: C:\eclipse\workspace\Ant Examples\build.xml Display-Builtins: [echo] C:\AntClass\PropertiesLab [echo] C:\AntClass\PropertiesLab\build.xml [echo] Apache Ant version 1.6.2 compiled on July 16, 2004 [echo] MyProject [echo] 1.5 BUILD SUCCESSFUL Total time: 188 milliseconds
See the ant reference manual for a full list of built-in ant and Java properties or you can try the following link for the Java properties: getProperties
Next Chapter, Next Cookbook Chapter
Depends
The depends
attribute can be included in the target
tag to specify that this target requires another target to be executed prior to being executed itself. Multiple targets can be specified and separated with commas.
<target name="one" depends="two, three">
Here, target "one" will not be executed until the targets named "two" and "three" are, first.
Example of using the depends
attribute
[edit | edit source]Here is an example of a build file that executes three targets in order, first, middle and last. Note that the order the targets appear in the build file is unimportant:
<?xml version="1.0" encoding="UTF-8"?> <project default="three"> <target name="one"> <echo>Running One</echo> </target> <target name="two" depends="one"> <echo>Running Two</echo> </target> <target name="three" depends="two"> <echo>Running Three</echo> </target> </project>
Sample Output:
Buildfile: build.xml one: [echo] Running One two: [echo] Running Two three: [echo] Running Three BUILD SUCCESSFUL Total time: 0 seconds
Redundant dependency
[edit | edit source]Ant keeps track of what targets have already run and will skip over targets that have not changed since they were run elsewhere in the file, for example:
<?xml version="1.0" encoding="UTF-8"?> <project default="three"> <target name="one"> <echo>Running One</echo> </target> <target name="two" depends="one"> <echo>Running Two</echo> </target> <target name="three" depends="one, two"> <echo>Running Three</echo> </target> </project>
will produce the same output as above - the target "one" will not be executed twice, even though both "two" and "three" targets are run and each specifies a dependency on one.
Circular dependency
[edit | edit source]Similarly, ant guards against circular dependencies - one target depending on another which, directly or indirectly, depends on the first. So the build file:
<?xml version="1.0" encoding="UTF-8"?> <project default="one"> <target name="one" depends="two"> <echo>Running One</echo> </target> <target name="two" depends="one"> <echo>Running Two</echo> </target> </project>
Will yield an error:
Buildfile: build.xml BUILD FAILED Circular dependency: one <- two <- one Total time: 1 second
Next Chapter, Next Cookbook Chapter
Fileset
FileSets are ant's way of creating groups of files to do work on. These files can be found in a directory tree starting in a base directory and are matched by patterns taken from a number of PatternSets and Selectors.
FileSet identifies the base directory tree with its dir attribute. Then the FileSet's enclosed pattern elements, both named (PatternSets) and selected by wildcards (Selectors), choose the files and folders within the base tree.
If any selector within the FileSet do not select a given file, that file is not considered part of the FileSet. This makes FileSets equivalent to an <and> selector container.
Wildcards
[edit | edit source]Wildcards are used by ant to specify groups of files that have a pattern to their names.
- ? : is used to match any character.
- * : is used to match zero or more characters.
- ** : is used to match zero or more directories.
Examples
[edit | edit source]The below FileSets all select the files in directory ${server.src} that are Java source files without "Test" in their name.
<fileset dir="${server.src}" casesensitive="yes"> <include name="**/*.java"/> <exclude name="**/*Test*"/> </fileset>
<fileset dir="${server.src}" casesensitive="yes"> <patternset id="non.test.sources"> <include name="**/*.java"/> <exclude name="**/*Test*"/> </patternset> </fileset>
<fileset dir="${client.src}"> <patternset refid="non.test.sources"/> </fileset>
<fileset dir="${server.src}" casesensitive="yes"> <filename name="**/*.java"/> <filename name="**/*Test*" negate="true"/> </fileset>
<fileset dir="${server.src}" casesensitive="yes"> <filename name="**/*.java"/> <not> <filename name="**/*Test*"/> </not> </fileset>
Finally
[edit | edit source]FileSets can appear as children of the project element or inside tasks that support this feature.
Next Section, Next Cookbook Chapter
Best Practices
Here are some of the Ant best practices that have been identified for creating maintainable Ant build files. Best Practices are not enforced by any compiler but they are conventions that allow people that are maintaining many projects to become quickly familiar with your build process.
Learn Ant Best Practices
[edit | edit source]Building your Ant Vocabulary
- Study ant build scripts for other Open Source projects
- Learn domain-specific targets such as building jar files, doing XML transforms or complex installs
- Depending on diversity of tasks this might take a few hours to a few weeks
What to do about local file system paths
[edit | edit source]Standard Targets
[edit | edit source]
Best Practices/Standard Targets
Standard Targets
[edit | edit source]One of the things that you learn is that if you name things consistently between projects, it is much easier to find things you are looking for. When you work with other people, you also want to have targets that you both are familiar with.
build.xml
[edit | edit source]- Place your main build in a file called build.xml in the main directory of your project.
- Do not put references to local file systems (Windows C:\ etc.) in your build file. Isolate these all in a local.properties file in the main directory.
Folder standards
[edit | edit source]src
- the location of your source codebuild
- the output of a build process
Standard ant targets
[edit | edit source]init
[edit | edit source]This target should create all temporary directories within the build folder.
clean
[edit | edit source]This target should remove all compiled and intermediate files leaving only source files. It should remove anything that can be derived from other files. This would be run just prior to creating a zip file of the project, and in case of gremlins occurring during the build process.
build
[edit | edit source]This target should compile sources and perform transforms of raw data.
install
[edit | edit source]The install target should be used to copy files to a testing or production system.
Other Standards
[edit | edit source]Use the <description>
element to describe what your target does.
If you have more than around 100 targets in your build file, it becomes unwieldy. You could consider calling a separate build file, but that adds other complications such as the dependency between targets.
Best Practices/Local Property Files
Using a Property file
[edit | edit source]One of the best ways to keep your build files free of local dependencies is to use a local property file
<property file="local.properties"/>
Here is a sample of a property file:
# Property file for Project X # Author # Date # Note that the format of this file adheres to the Java Property file specification # http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader) # to use the file put the following in your ant file: # <propertyfile file="my.properties"> # All file names on local hard drives should be stored in this directory # Where Ant is installed. Will not work with 1.5 due to exec/spawn calls antHome=C:/Apps/apache-ant-1.6.5 Saxon8HomeDir=C:/Apps/saxon8 saxon8jar=${Saxon8HomeDir}/saxon8.jar # used to make sure Saxon gets the right XSLT 2.0 processor processor=trax
Best Practices/Local File Systems
Dealing with Local File System Issues
[edit | edit source]- Each developer or user has the right (or is forced by administrators) to put resource such as jar files and libraries in different locations
- Try to avoid having ANY local file system location dependencies in your build files. Make sure you NEVER put C: in a build file. This is just plain bad behavior
- Separate local file system access points in an external "property file"
- Warning: property files are read by Java tools and are not always path separator aware. You can use "\\" on java, or, knowing that Ant expands existing properties, ${path.separator}
- Allow people to check out all the files in a project including the build.xml file, customize their local library paths and build
- Third party projects such as Ivy and Maven2 Ant tasks try to automate the entire library management process. Consider these on a very large/complex project.
Depends
The depends
attribute can be included in the target
tag to specify that this target requires another target to be executed prior to being executed itself. Multiple targets can be specified and separated with commas.
<target name="one" depends="two, three">
Here, target "one" will not be executed until the targets named "two" and "three" are, first.
Example of using the depends
attribute
[edit | edit source]Here is an example of a build file that executes three targets in order, first, middle and last. Note that the order the targets appear in the build file is unimportant:
<?xml version="1.0" encoding="UTF-8"?> <project default="three"> <target name="one"> <echo>Running One</echo> </target> <target name="two" depends="one"> <echo>Running Two</echo> </target> <target name="three" depends="two"> <echo>Running Three</echo> </target> </project>
Sample Output:
Buildfile: build.xml one: [echo] Running One two: [echo] Running Two three: [echo] Running Three BUILD SUCCESSFUL Total time: 0 seconds
Redundant dependency
[edit | edit source]Ant keeps track of what targets have already run and will skip over targets that have not changed since they were run elsewhere in the file, for example:
<?xml version="1.0" encoding="UTF-8"?> <project default="three"> <target name="one"> <echo>Running One</echo> </target> <target name="two" depends="one"> <echo>Running Two</echo> </target> <target name="three" depends="one, two"> <echo>Running Three</echo> </target> </project>
will produce the same output as above - the target "one" will not be executed twice, even though both "two" and "three" targets are run and each specifies a dependency on one.
Circular dependency
[edit | edit source]Similarly, ant guards against circular dependencies - one target depending on another which, directly or indirectly, depends on the first. So the build file:
<?xml version="1.0" encoding="UTF-8"?> <project default="one"> <target name="one" depends="two"> <echo>Running One</echo> </target> <target name="two" depends="one"> <echo>Running Two</echo> </target> </project>
Will yield an error:
Buildfile: build.xml BUILD FAILED Circular dependency: one <- two <- one Total time: 1 second
Next Chapter, Next Cookbook Chapter
Property
Ant does not have variables like in most standard programming languages. Ant has a structure called properties. Understanding how properties work is critical to understanding how (and why) Ant works so well.
Here is a simple demonstration of how to set and use properties
<project name="My Project" default="MyTarget">
<!-- set global properties -->
<property name="SrcDir" value="src"/>
<property name="BuildDir" value="build"/>
<target name="MyTarget">
<echo message = "Source directory is = ${SrcDir}"/>
<echo message = "Build directory is ${BuildDir}"/>
</target>
</project>
Note that to use a property you have to put a dollar sign and left curly brace before it and a right curly brace after it. Don't get these confused with parens.
When you run this you should get the following:
Buildfile: C:\AntClass\PropertiesLab\build.xml MyTarget: [echo] Source directory is = src [echo] Build directory is build BUILD SUCCESSFUL Total time: 204 milliseconds
Ant properties are immutable meaning that once they are set they cannot be changed within a build process! This may seem somewhat odd at first, but it is one of the core reasons that once targets are written they tend to run consistently without side effects. This is because targets only run if they have to and you cannot predict the order a target will run.
Properties do not have to be used only inside a target. They can be set anywhere in a build file (or an external property file) and referenced anywhere in a build file after they are set.
Here is a small Ant project that demonstrates the immutability of a property:
<project name="My Project" default="MyTarget">
<target name="MyTarget">
<property name="MyProperty" value="One"/>
<!-- check to see that the property gets set -->
<echo>MyProperty = ${MyProperty}</echo>
<!-- now try to change it to a new value -->
<property name="MyProperty" value="Two"/>
<echo>MyProperty = ${MyProperty}</echo>
</target>
</project>
When you run this, you should get the following output:
Buildfile: C:\AntClass\PropertiesLab\build.xml MyTarget: [echo] MyProperty = One [echo] MyProperty = One BUILD SUCCESSFUL Total time: 343 milliseconds
Note that despite trying to change MyProperty to be "Two", the value of MyProperty does not change. Ant will not warn you of this.
For newcomers this might seem strange, but this is ideally suited for building up complex trees of values that are set once and used over and over again. It makes your build scripts easy to maintain and reliable.
Ant also has a nice set of "built in" properties that you can use:
This demonstrates how to read system properties
<project name="MyProject" default="Display-Builtins">
<target name="Display-Builtins" description="Display Builtin Properties">
<!-- the absolute path to the location of the buildfile -->
<echo>${basedir}</echo>
<!-- the absolute path of the buildfile -->
<echo>${ant.file}</echo>
<!-- ant.version - the version of Ant that you are running -->
<echo>${ant.version}</echo>
<!-- ant.project.name - the name of the project that is currently executing; it is set in the name attribute of <project>. -->
<echo>${ant.project.name}</echo>
<!-- ant.java.version - the JVM version Ant detected; currently it can hold the values "1.1", "1.2", "1.3", "1.4" and "1.5". -->
<echo>${ant.java.version}</echo>
</target>
</project>
When you run this program you should get an output similar to the following:
Buildfile: C:\eclipse\workspace\Ant Examples\build.xml Display-Builtins: [echo] C:\AntClass\PropertiesLab [echo] C:\AntClass\PropertiesLab\build.xml [echo] Apache Ant version 1.6.2 compiled on July 16, 2004 [echo] MyProject [echo] 1.5 BUILD SUCCESSFUL Total time: 188 milliseconds
See the ant reference manual for a full list of built-in ant and Java properties or you can try the following link for the Java properties: getProperties
Next Chapter, Next Cookbook Chapter
Fileset
FileSets are ant's way of creating groups of files to do work on. These files can be found in a directory tree starting in a base directory and are matched by patterns taken from a number of PatternSets and Selectors.
FileSet identifies the base directory tree with its dir attribute. Then the FileSet's enclosed pattern elements, both named (PatternSets) and selected by wildcards (Selectors), choose the files and folders within the base tree.
If any selector within the FileSet do not select a given file, that file is not considered part of the FileSet. This makes FileSets equivalent to an <and> selector container.
Wildcards
[edit | edit source]Wildcards are used by ant to specify groups of files that have a pattern to their names.
- ? : is used to match any character.
- * : is used to match zero or more characters.
- ** : is used to match zero or more directories.
Examples
[edit | edit source]The below FileSets all select the files in directory ${server.src} that are Java source files without "Test" in their name.
<fileset dir="${server.src}" casesensitive="yes"> <include name="**/*.java"/> <exclude name="**/*Test*"/> </fileset>
<fileset dir="${server.src}" casesensitive="yes"> <patternset id="non.test.sources"> <include name="**/*.java"/> <exclude name="**/*Test*"/> </patternset> </fileset>
<fileset dir="${client.src}"> <patternset refid="non.test.sources"/> </fileset>
<fileset dir="${server.src}" casesensitive="yes"> <filename name="**/*.java"/> <filename name="**/*Test*" negate="true"/> </fileset>
<fileset dir="${server.src}" casesensitive="yes"> <filename name="**/*.java"/> <not> <filename name="**/*Test*"/> </not> </fileset>
Finally
[edit | edit source]FileSets can appear as children of the project element or inside tasks that support this feature.
Next Section, Next Cookbook Chapter
XML
Ant provides targets to validate and transform XML documents.
XMLwellformed - how to use Apache ant to check an XML file for well formedness
XMLvalidate - how to use Apache ant to validate an XML file against an XML Schema
XSLT - how to use Apache ant to run an XML transform
XMLwellformed
You can use Apache ant to check a file or group of files for well-formedness. This is different from validation. Checking for well formedness simply checks for the consistency of begin and end tags. No XML Schema file is used.
This is done by using the <xmlvalidate> task. The xmlvalidate ant task will use a standard ant <fileset> and go through each file. In the example below, we specify a directory called "in" using a property. We then use the fileset to find all XML files in that directory and all subdirectories of that directory.
<project default="CheckXML"> <property name="MYROOTDIR" value="in"/> <target name="CheckXML" description="Checks that all files at or below MYROOTDIR are well formed"> <xmlvalidate> <fileset dir="${MYROOTDIR}" includes="**/*.xml"/> <attribute name="http://xml.org/sax/features/validation" value="false"/> <attribute name="http://apache.org/xml/features/validation/schema" value="false"/> </xmlvalidate> </target> </project>
This target will run the default XML parser that comes with Ant (usually Xerces) and report any file that is not well-formed.
To test this example, add a folder called "in" and put several XML files in the folder that are malformed. In this case we created a mal-formed file called MyInputBad.xml. When we type "build" at the command line the following was the output:
CheckXML: [xmlvalidate] C:\XMLClass\Ant\in\MyInputBad.xml:5:32: The element type "MyMessag e" must be terminated by the matching end-tag "</MyMessage>".
See also
[edit | edit source]
XMLvalidate
Motivation
[edit | edit source]You want a command-line interface to validate one or more XML files.
Instructors Note: This file is used as a lab exercise for an Apache Ant class that includes extensive use of XML.
Method
[edit | edit source]You can use Apache ant to check a file or group of files for their validity. This is done by using the <xmlvalidate> Apache Ant task. The xmlvalidate ant task will use a standard ant <fileset> and go through and check each file. In the example below, we specify a directory called "in" using a property. We then use the fileset to find all XML files in that directory and all subdirectories of that directory. Each file is successfully checked for validity against an XML schema.
Sample Ant Task to Validate All XML Files in a Folder
[edit | edit source]<project default="ValidateXML"> <property name="MYROOTDIR" value="in"/> <target name="ValidateXML" description="Checks that all files at or below MYROOTDIR are well formed"> <xmlvalidate> <fileset dir="${MYROOTDIR}" includes="**/*.xml"/> <attribute name="http://xml.org/sax/features/validation" value="true"/> <attribute name="http://apache.org/xml/features/validation/schema" value="true"/> <attribute name="http://xml.org/sax/features/namespaces" value="true"/> </xmlvalidate> </target> </project>
In the above example, we assume that each XML file has a directive that tells it where to get its XML Schema.
This target will run the default XML parser that comes with Ant (usually Xerces) and report any file that is not well-formed.
Sample XML Schema MyMessages.xsd
[edit | edit source]To test this you will need a small XML Schema file. The following file read a files of three messages:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="MyMessage" type="xs:string"/> <xs:element name="MyMessages"> <xs:complexType> <xs:sequence> <xs:element ref="MyMessage" maxOccurs="3"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Sample Valid Data File
[edit | edit source]Here is a sample message file:
<?xml version="1.0" encoding="UTF-8"?> <MyMessages xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="MyMessages.xsd"> <MyMessage>Hello World!</MyMessage> <MyMessage>ANT AND XML Schema ROCK</MyMessage> </MyMessages>
Note that the noNamespaceSchemaLocation attribute of the root element tells it to look in the current directory to find the XML schema file (MyMessages.xsd)
Sample Invalid Data File
[edit | edit source]If you add a fourth message the file should fail validation according to the rules in the XML Schema above.
<?xml version="1.0" encoding="UTF-8"?> <MyMessages xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="MyMessages.xsd"> <MyMessage>Hello From XSLT</MyMessage> <MyMessage>From input: Hi</MyMessage> <MyMessage>ANT AND XSLT ROCK</MyMessage> <MyMessage>I am the fourth message.</MyMessage> </MyMessages>
To test this example, add a folder called "in" and put several XML files in the folder that are not valid. In this case we created a invalid file called MyInputBad.xml. When we type "build" at the command line the following was the output:
Sample Output
[edit | edit source]ValidateXML: [xmlvalidate] C:\XMLClass\Ant\validate\in\MyInput.xml:6:15: cvc-complex type.2.4.d: Invalid content was found starting with element 'MyMessage'. No child element is expected at this point.
This is a sample output. Note that the error message does not indicate that you exceed a count of 3 data elements.
Supplying an XML Schema definition file
[edit | edit source]If you are working in the null namespace add the following attribute:
<attribute name="http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation" value="${xsd.file}"/>
If your documents have a namespace use the following:
<attribute name="http://xml.org/sax/features/namespaces" value="true"/>
<attribute name="http://apache.org/xml/properties/schema/external-schemaLocation" value="${xsd.file}"/>
If the XML files do not include their own schema, you can also create an ant task that includes where to find the XML schema. This is done using an special ant property.
<property name="http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation" value="${xsd.file}"/> <xmlvalidate file="xml/endpiece-noSchema.xml" lenient="false" failonerror="true" warn="true"> <attribute name="http://apache.org/xml/features/validation/schema" value="true"/> <attribute name="http://xml.org/sax/features/namespaces" value="true"/> </xmlvalidate>
Schematron Validate
[edit | edit source]Apache ant also has an element to validate against a schematron rules file
<taskdef name="schematron"
classname="com.schematron.ant.SchematronTask"
classpath="lib/ant-schematron.jar"/>
<schematron schema="rules.sch" failonerror="false">
<fileset includes="schmatron-input.xml"/>
</schematron>
See http://www.schematron.com/resource/Using_Schematron_for_Ant.pdf for more details.
Navigation
[edit | edit source]Previous Chapter, Next Chapter
See also
[edit | edit source]References
[edit | edit source]
XSLT
Apache Ant has a task called <xslt> (or its synonym <style>) that performs an XML transform on a file or group of files.
Here is an example XML transformation target:
<target name="MyXSLT">
<xslt in="MyInput.xml"
out="MyOutput.xml"
style="MyTransform.xslt">
</xslt>
</target>
In the ant target there are three files you must specify:
- in The name of the source XML input file
- out The name of the XML output file
- style The name of the XSLT file
To test this you can create a "dummy" input file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Input>Hi</Input>
</root>
Hello World XSLT Transform
[edit | edit source]To get started, here is a small "hello world" transform file. The transform looks for the root data element of the input file but does not actually process any of the input file data elements:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<MyMessage>Hello World!</MyMessage>
</xsl:template>
</xsl:stylesheet>
You can now execute this from a command line. The following is an example run from a Microsoft Windows command shell:
C:\XMLClass\XSLT\Lab1>ant Buildfile: build.xml MyXSLT: [xslt] Processing C:\XMLClass\XSLT\Lab1\MyInput.xml to C:\XMLClass\XSLT\Lab 1\MyOutput.xml [xslt] Loading stylesheet C:\XMLClass\XSLT\Lab1\MyTransform.xslt BUILD SUCCESSFUL Total time: 1 second
The output will appear in a file called MyOutput.xml
<?xml version="1.0" encoding="UTF-8"?>
<MyMessage>Hello World!</MyMessage>
Transforming Files containing external References
[edit | edit source]Sometimes you may need to transform XML files containing external references, like URLs in DTDs or Schema definitions.
Quite often, parsing or validating against such external files can not be totally disabled. Saxon for example will want to read DTDs even if parsing them is disabled (parameter "-dtd:off" or equivalent).
In such cases it may also occur, that the development workstation is connected to a company intranet that is protected by a firewall from the internet, and needs some sort of proxy or socks configuration.
In these cases, the only solution to successfully execute the transformation is by adding this connection configuration to the ant script.
Example (taken from a bigger build.xml file):
<target name="xdoclet-merge-top" depends="init, proxy-set" >
<xslt style="${XDocletDescDir}/merge.xslt"
in="${XDocletDescDir}/merge1.xml"
out="${XDocletDescDir}/jboss-2.xml" force="true" >
<classpath location="${ZubehoerDir}/SaxonHE9-4-0-1J/saxon9he.jar" />
</xslt>
</target>
<target name="proxy-set">
<setproxy proxyhost="proxy.mynet.de" proxyport="8080" proxyuser="" proxypassword=""/>
</target>
Passing Parameters from Ant into an XSLT script
[edit | edit source]You can also pass parameters from an ant build file into an XSLT. This is handy if you need to run the same transform with small variations. You can do this by simply adding the param
tag the <xslt> target:
<param name="MyParameter" expression="ANT AND XSLT ROCK"/>
The ant task now looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<project default="MyXSLT">
<target name="MyXSLT">
<xslt
in="MyInput.xml"
out="MyOutput.xml"
style="MyTransform.xslt">
<param name="MyParameter" expression="ANT AND XSLT ROCK"/>
</xslt>
</target>
</project>
Here is a sample transform that takes a single input parameter:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:param name="MyParameter"/>
<xsl:template match="/">
<MyMessages>
<MyMessage>Hello From XSLT</MyMessage>
<MyMessage>From input: <xsl:value-of select="/root/Input"/>
</MyMessage>
<MyMessage>
<xsl:value-of select="$MyParameter"/>
</MyMessage>
</MyMessages>
</xsl:template>
</xsl:stylesheet>
This will create the following output:
<?xml version="1.0" encoding="UTF-8"?>
<MyMessages>
<MyMessage>Hello From XSLT</MyMessage>
<MyMessage>From input: Hi</MyMessage>
<MyMessage>ANT AND XSLT ROCK</MyMessage>
</MyMessages>
Note that there are three different lines. One came from the transform file, one came from the input XML file and one was passed directly in from the ant file.
Other ways to use XSLT within Apache Ant
[edit | edit source]Checking dependencies
[edit | edit source]By default, the XSLT task will check the file time stamps to see if the output file is newer than the input file. If the outputs are newer the task should not have to re-run the transform. But sometimes a transform will import other transforms files and Ant does not check the timestamps of imported files. (Perhaps they will add that as an option in the future.) But all is not lost. We can achieve the same results by using the <dependset> tag. Here is an example:
<dependset>
<srcfilelist dir="${XSLTDir}"
files="Content2HTML.xsl, HTMLHeader.xsl,PageHeader.xsl,LeftNav.xsl,PageFooter.xsl"/>
<targetfileset
dir="${BuildDir}"
includes="*.htm"/>
</dependset>
In the above example the source transform (Content2HTML.xsl) imported the other four page fragment transforms located in the XSLTDir (HTMLHeader.xsl, PageHeader.xsl, LeftNav.xsl and PageFooter.xsl). It created the files in the BuildDir directory. If any of the inputs files change, the outputs will be regenerated.
This is a handy way to build a little ant-based web content management system. You just put the HTML content in a directory and the transform can wrap the HTML headers, navigation bars and footers around your content. The HTML for each page can just be a <div> section that is copied into the output using the <xsl:copy-of> command.
References
[edit | edit source]- Apache Ant XSLT command http://ant.apache.org/manual/Tasks/xslt.html
Running Saxon
Motivation
[edit | edit source]You want to have an Apache Ant task that runs the Saxon XSLT transform.
Method
[edit | edit source]Download the Saxon jar file. Put the saxon.jar file in a lib folder. Run the following test.
Source Code
[edit | edit source]Build File
[edit | edit source]The following is how Saxon is invoked from Apache Ant.
<target name="test-saxon">
<xslt classpath="lib\saxon8.jar"
in="in.xml"
out="out.html"
style="check-version.xsl">
<factory name="net.sf.saxon.TransformerFactoryImpl"/>
</xslt>
</target>
Note that if you are running in Eclipse you will have to go to the "Preferences" menu and add the saxon9.jar file to Ant/Runtime/Ant Home Entries. Just click the "Add JARs" and add the saxon9jar file the end of this list.
XSLT Version Check
[edit | edit source]check-version.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<results>
<Version><xsl:value-of select="system-property('xsl:version')" /></Version>
<Vendor><xsl:value-of select="system-property('xsl:vendor')" /></Vendor>
<Vendor-URL><xsl:value-of select="system-property('xsl:vendor-url')" /></Vendor-URL>
</results>
</xsl:template>
</xsl:stylesheet>
Or if you are generating a web page:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>XSL Version</title>
</head>
<body>
<p>Version:
<xsl:value-of select="system-property('xsl:version')" />
<br />
Vendor:
<xsl:value-of select="system-property('xsl:vendor')" />
<br />
Vendor URL:
<xsl:value-of select="system-property('xsl:vendor-url')" />
</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Results for XALAN
[edit | edit source]Results for Apache XALAN
1.0 Vendor: Apache Software Foundation (Xalan XSLT) Vendor URL: http://xml.apache.org/xalan-j
Results for Saxon
[edit | edit source]Version: 2.0 Vendor: SAXON 9.1.0.7 from Saxonica Vendor URL: http://www.saxonica.com/
Passing Parameters to XSLT
Motivation
[edit | edit source]You want to call a transform with a set of parameters. You want to be able to set these parameters from a build file.
Build File Target
[edit | edit source]<!-- sample target to demonstrate the use of passing parameters from an ant file to a XSL tranform --> <target name="Parameterized XSLT Test"> <echo>Running conditional XSLT test</echo> <xslt in="null.xml" out="tmp/param-output.xhtml" style="xslt/TransformWithParametersTest.xsl"> <factory name="net.sf.saxon.TransformerFactoryImpl"/> <param name="Parameter1" expression="true"/> <param name="Parameter2" expression="Hello World"/> <param name="Parameter3" expression="1234567"/> </xslt> <concat> <fileset dir="tmp" file="param-output.xml"/> </concat> </target>
Input null.xml
[edit | edit source]XSLT must have an input, but this example does not use it.
<root/>
XSLT
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0"> <xsl:param name="Parameter1" select="true()" as="xs:boolean"/> <xsl:param name="Parameter2" required="yes" as="xs:string"/> <xsl:param name="Parameter3" required="yes" as="xs:integer"/> <xsl:output method="xhtml" omit-xml-declaration="yes"/> <xsl:template match="/"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Test of Passing Three Parameters (boolean, string, integer)</title> </head> <body> <h1>Test of Passing Three Parameters (boolean, string, integer)</h1> <p>The following parameters have been set by the Apache Ant build file.</p> <ul> <li><b>Parameter1: </b><xsl:value-of select="$Parameter1"/> </li> <li><b>Parameter2: </b><xsl:value-of select="$Parameter2"/> </li> <li><b>Parameter3: </b><xsl:value-of select="$Parameter3"/> </li> </ul> </body> </html> </xsl:template> </xsl:stylesheet>
XQuery
Motivation
[edit | edit source]You want to transform an XML document with XQuery using an ant task.
Method
[edit | edit source]We will use the Saxon library to demonstrate this.
Steps:
- Download the Saxon library from Sourceforge
- Download a sample XQuery from the samples (for example tour.xq from the samples area)
- Copy the Saxon jar file into your project. In the example below just a single jar file is copied into the location saxonhe9-2-0-6j/saxon9he.jar
Sample Ant Target
[edit | edit source]This sample uses the java task to run an XQuery program using the Saxon Java library. In the example below the XQuery tour.xq is executed and the output is copied into the file output.html.
Note that the starting point is set by passing the arg as a parameter to the XQuery.
<target name="run-saxon-xquery">
<java classname="net.sf.saxon.Query" output="output.html">
<arg value="tour.xq"/>
<classpath>
<pathelement location="saxonhe9-2-0-6j/saxon9he.jar"/>
</classpath>
<arg value="start=e5"/>
</java>
<!-- On Windows, this will open FireFox after the Transform is done -->
<exec command="C:\Program Files\Mozilla Firefox\firefox.exe
C:\ws\Saxon-Test\output.html"/>
</target>
Converting Excel to XML
Motivation
[edit | edit source]You want to automatically extract a well-formed XML file from a binary Excel document.
Method
[edit | edit source]We will us the java Ant task within a build target.
Input File
[edit | edit source]We will create a sample Microsoft Excel file that has two columns like the following:
Save this into a file 'sample.xls'.
Next, download the Apache Tika jar file and put is on your local hard drive.
You can get the downloads from here: http://tika.apache.org/download.html the Main Tika jar file is about 27MB.
I put the tika jar file in D:\Apps\tika but you can change this.
Create a file called "build.xml"
Sources
[edit | edit source]<project name="extract-xml-from-xsl" default="extract-xml-from-xsl">
<description>Sample Extract XML from Excel xsl file with Apache Tika</description>
<property name="lib.dir" value="D:\Apps\tika"/>
<property name="input-file" value="sample.xls"/>
<target name="extract-xml-from-xsl">
<echo message="Extracting XML from Excel file: ${input-file}"/>
<java jar="${lib.dir}/tika-app-1.3.jar" fork="true" failonerror="true"
maxmemory="128m" input="${input-file}" output="sample.xml">
<arg value="-x" />
</java>
</target>
</project>
The <java> task will run tika. The argument "-x" (for XML will extract the XML from the input.
Other command line options are listed here: http://tika.apache.org/1.3/gettingstarted.html
Now open your DOS or UNIX shell and cd into the place with your build file. Type "ant" into a command shell.
Run
[edit | edit source]$ ant
Buildfile: D:\ws\doc-gen\trunk\build\tika\build.xml
extract-xml-from-xsl:
[echo] Extracting XML from Excel file: sample.xls
BUILD SUCCESSFUL
Total time: 1 second
Sample Output
[edit | edit source]Note that the output is a well formed HTML file with a table in it:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="meta:last-author" content="Dan" />
<meta name="meta:creation-date" content="2013-03-04T17:20:19Z" />
<meta name="dcterms:modified" content="2013-03-04T17:22:01Z" />
<meta name="meta:save-date" content="2013-03-04T17:22:01Z" />
<meta name="Last-Author" content="Dan" />
<meta name="Application-Name" content="Microsoft Excel" />
<meta name="dc:creator" content="Dan" />
<meta name="Last-Modified" content="2013-03-04T17:22:01Z" />
<meta name="Author" content="Dan" />
<meta name="dcterms:created" content="2013-03-04T17:20:19Z" />
<meta name="date" content="2013-03-04T17:22:01Z" />
<meta name="modified" content="2013-03-04T17:22:01Z" />
<meta name="creator" content="Dan" />
<meta name="Creation-Date" content="2013-03-04T17:20:19Z" />
<meta name="meta:author" content="Dan" />
<meta name="extended-properties:Application" content="Microsoft Excel" />
<meta name="Content-Type" content="application/vnd.ms-excel" />
<meta name="Last-Save-Date" content="2013-03-04T17:22:01Z" />
<title></title>
</head>
<body>
<div class="page"><h1>Sheet1</h1>
<table>
<tbody>
<tr>
<td>Name</td>
<td>Phone</td>
</tr>
<tr>
<td>Peg</td>
<td>123</td>
</tr>
<tr>
<td>Dan</td>
<td>456</td>
</tr>
<tr>
<td>John</td>
<td>789</td>
</tr>
<tr>
<td>Sue</td>
<td>912</td>
</tr>
</tbody>
</table>
</div>
</html>
Cleaning up HTML
Motivation
[edit | edit source]We want to clean up HTML that is not well formed. We will use the Apache Tika tools to convert dirty HTML to well-formed XHTML.
Sample Ant File
[edit | edit source]<project name="tika tests" default="extract-xhtml-from-html">
<description>Sample invocations of Apache Tika</description>
<property name="lib.dir" value="../lib"/>
<property name="input-dirty-html-file" value="input-dirty.html"/>
<property name="output-clean-xhtml-file" value="output-clean.xhtml"/>
<target name="extract-xhtml-from-html">
<echo message="Cleaning up dirty HTML file: ${input-dirty-html-file} to ${output-clean-xhtml-file}"/>
<java jar="${lib.dir}/tika-app-1.3.jar" fork="true" failonerror="true"
maxmemory="128m" input="${input-dirty-html-file}" output="${output-clean-xhtml-file}">
<arg value="-x" />
</java>
</target>
</project>
Sample Input
[edit | edit source]<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Dirty HTML</title>
</head>
<body>
<p><b>test</b></p>
<p><b>test<b></p>
<p>test<br/>test</p>
<p>test<br>test<br>test</p>
<p>This is <B>bold, <I>bold italic, </b>italic, </i>normal text</p>
</body>
</html>
Sample Output
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Encoding" content="ISO-8859-1"/>
<meta name="Content-Type" content="application/xhtml+xml"/>
<meta name="dc:title" content="Dirty HTML"/>
<title>Dirty HTML</title>
</head>
<body>
<p>test</p>
<p>test</p>
<p>test
test</p>
<p>test
test
test</p>
<p>This is bold, bold italic, italic, normal text</p>
</body></html>
Converting PDF to XML
Science and industry in Basic 7
Apache Ant Project to Extract Text From PDF
[edit | edit source]<project name="extract-text-from-pdf" default="extract-text-from-pdf">
<description>Sample invocations of Apache Tika</description>
<property name="lib.dir" value="../lib"/>
<property name="input-pdf-file" value="myDocument.pdf"/>
<property name="output-clean-xhtml-file" value="output-clean.xhtml"/>
<target name="extract-text-from-pdf">
<echo message="Extracting XML from PDF: ${input-pdf-file} to ${output-clean-xhtml-file}"/>
<java jar="${lib.dir}/tika-app-1.3.jar" fork="true" failonerror="true"
maxmemory="128m" input="${input-pdf-file}" output="${output-clean-xhtml-file}">
<arg value="-x" />
</java>
</target>
</project>
Store XML data
Motivation
[edit | edit source]You want to upload a file or a hierarchy of files into eXist.
Method
[edit | edit source]We will use the xdb:store function and demonstrate how to use its options to load subfolders.
Sample Code
[edit | edit source]Each build file must have four key components
- a reference to internal files on your hard drive (ideally in a properties file)
- a typedef for your Ant eXist extensions
- a path to tell it where to get the jar files
- a target to do the load
<project xmlns:xdb="http://exist-db.org/ant" default="upload-collection-to-exist">
<!-- This is where I put my copy of the eXist trunk code -->
<!-- It is the result of a subversion checkout from https://exist.svn.sourceforge.net/svnroot/exist/trunk -->
<property name="exist-home" value="C:\ws\exist-trunk"/>
<!-- this tells us where to find the key jar files relative to the ${exist-home} property -->
<path id="classpath.core">
<fileset dir="${exist-home}/lib/core">
<include name="*.jar"/>
</fileset>
<pathelement path="${exist-home}/exist.jar"/>
<pathelement path="${exist-home}/exist-optional.jar"/>
</path>
<typedef resource="org/exist/ant/antlib.xml" uri="http://exist-db.org/ant">
<classpath refid="classpath.core"/>
</typedef>
<target name="upload-collection-to-exist">
<echo message="Loading Documents to eXist."/>
<xdb:store
uri="xmldb:exist://localhost:8080/xmlrpc/db/my-project"
createcollection="true"
createsubcollections="true"
user="admin" password="">
<fileset dir="C:\ws\my-project\trunk\db\my-project">
<include name="**/*.*"/>
</fileset>
</xdb:store>
</target>
</project>
Using a local.properties File to Load XML Data
[edit | edit source]The script above will work fine if you have a single use with one set of local files. But if you have many user each user may put their local files in a different location. If that is the case then you will want to isolate all local file references in a file called local.properties.
The following example is from the eXist documentation project for a server running on port 8080 with the context being set to be "/":
# Local Property file for eXist documentation project
#
# this file is loaded into the build.xml file using the <property file="local.properties"/>
# it contains any local references to your
# Properties on a Windows system
exist-home=C:\\ws\\exist-trunk
exist-docs=C:\\ws\\exist-docs
user=admin
password=
uri=xmldb:exist://localhost:8080/xmlrpc/db/apps/exist-docs
<project xmlns:xdb="http://exist-db.org/ant" default="upload-exist-docs-app"
name="eXist Load Example">
<!-- this is where we set our exist-home, user, password and the place that we will load the docs -->
<property file="local.properties"/>
<!-- this tells us where to find the key jar files relative to the ${exist-home} property -->
<path id="classpath.core">
<fileset dir="${exist-home}/lib/core">
<include name="*.jar"/>
</fileset>
<pathelement path="${exist-home}/exist.jar"/>
<pathelement path="${exist-home}/exist-optional.jar"/>
</path>
<typedef resource="org/exist/ant/antlib.xml" uri="http://exist-db.org/ant">
<classpath refid="classpath.core"/>
</typedef>
<!-- upload app -->
<target name="upload-exist-docs-app">
<echo message="Loading eXist documentation system to eXist."/>
<xdb:store uri="${uri}" createcollection="true"
createsubcollections="true" user="admin" password="">
<fileset dir="${exist-docs}">
<include name="**/*.*"/>
</fileset>
</xdb:store>
</target>
<target name="show-properties">
<echo message="exist-home=${exist-home}"/>
<echo message="exist-docs=${exist-docs}"/>
<echo message="uri=${uri}"/>
</target>
</project>
References
[edit | edit source]The eXist store task is documented here: http://exist-db.org/exist/apps/doc/ant-tasks.xml#D2.2.10
Reindex a Collection
Motivation
[edit | edit source]You want a simple ant task that will reindex a collection.
Method
[edit | edit source]We will us the ant task that will call an XQuery that has the reindex() command in it. Because there is no ant task that does this we will use the xquery task to execute a remote XQuery that performs this task.
Here is a link to the ant task to run an XQuery http://exist-db.org/ant-tasks.html#N1041F
Call a remote XQuery by file name
[edit | edit source]<target name="reindex-collection">
<xdb:xquery user="${user}" password="${password}"
uri="${test-server}$(collection)" query="reindex.xq"
outputproperty="result">
</xdb:xquery>
<echo message="Result = ${result}"/>
</target>
Supply the Body of an XQuery
[edit | edit source]<target name="inline-query">
<xdb:xquery uri="${test-server}/db"
user="${user}" password="${password}"
outputproperty="result">
reindex('/db/mycollection')
</xdb:xquery>
<!-- note, this only returns a SINGLE line -->
<echo message="Result = ${result}"/>
</target>
Execute an XQuery
Motivation
[edit | edit source]You want to execute an XQuery that is stored in an eXist database.
Remote execution of an inline query
[edit | edit source]<target name="run-one-inline-test-local">
<description>Execute a single xUnit test on a local system</description>
<echo message="Run an inline XQuery"/>
<xdb:xquery uri="xmldb:exist://localhost/xmlrpc/db" user="${user}" password="${password}"
outputproperty="result">
xquery version "1.0";
let $message := 'Hello World!'
return $message
</xdb:xquery>
<echo message="Result = ${result}"/>
</target>
Note that you only can return a string in this example. Any XML content in the query will generate an error.
If you want to return an XML file into a property you will need to wrap you query in a CDATA structure:
<!-- This version uses CDATA to put an XML file into the result property -->
<target name="run-xquery-cdata">
<xdb:xquery user="admin" password="" uri="${test-server}/db" outputproperty="result"><![CDATA[
xquery version "1.0";
let $message := 'Hello World'
return
<result>{$message}</result>
]]></xdb:xquery>
<echo message="Result = ${result}"/>
</target>
Execute an XQuery Stored in Local Drive
[edit | edit source]hello-world.xq:
xquery version "1.0";
let $message := 'Hello World'
return
<result>{$message}</result>
This is similar to the version above but you will note that the queryfile attribute has been added.
<target name="run-in-database-query" depends="load-test-resources">
<xdb:xquery user="${user}" password="${password}"
uri="xmldb:exist://localhost/xmlrpc/db" queryfile="hello-world.xq"
outputproperty="result"/>
<echo message="Result = ${result}"/>
</target>
Note for the above to work the file hello-word.xq MUST be in the same directory as the build script.
Adding Execute Permissions
[edit | edit source]<target name="add-execute">
<!-- make the controller.xql file executable -->
<xdb:chmod uri="${local-uri}/apps/myapp" resource="controller.xql" permissions="group=+execute,other=+execute"/>
</target>
Where the local-uri is something like: xmldb:exist://localhost:8080/exist/xmlrpc/db for the default installation path
Creating a .xar file
Motivation
[edit | edit source]This example is under development!
You want to create an XML archive file (.xar file) directly from your source code that can be used to load library modules or applications into a native XML database. This makes it much easier for users to install your module or application. The packaging process does all the work of uploading your files into the correct location on a running eXist server and also sets all the permissions of the XQuery files (.xq) for you automatically.
Method
[edit | edit source]We need to create a "zip" file with all the right components in it.
The format of the package is here:
The eXist-specific package documentation is here:
http://demo.exist-db.org/exist/apps/doc/repo.xml
GUI Package vs. On-Disk Library vs. In DB Library
[edit | edit source]There are three types of installation packages:
- A external library that is not in the database
- A library that is loaded into the database
- A full application with a GUI
For all library apps without GUI but deployed into db you must use two attributes, one for the target the type="library" use the following structure:
target="some /db path" + type="library"
For a simple XQuery library package, which only needs to be registered with eXist but not deployed within the exist database the target attribute should not be used.
no target + type="library"
Sample Package Structure
[edit | edit source]The archive must contain two XML descriptor files in the root directory: expath-pkg.xml and repo.xml
Sample expath-pkg.xml file
<package xmlns="http://expath.org/ns/pkg" name="http://example.com/apps/myapp"
abbrev="myapp" version="0.1" spec="1.0">
<title>My Cool Application</title>
<dependency package="http://exist-db.org/apps/xsltforms"/>
</package>
Note that the file name and the string in the namespace are "pkg" but the element name and the attribute in the dependency are "package". Make sure to keep these clear.
The format of this XML file is describe in the EXPath documentation.
Sample repo.xml file that contains instructions for the eXist-specific packaging
<meta xmlns="http://exist-db.org/xquery/repo">
<description>My eXist application</description>
<author>Dan McCreary</author>
<website>http://danmccreary.com</website>
<status>alpha</status>
<license>GNU-LGPL</license>
<copyright>true</copyright>
<!-- set this to "application" (without quotes) for system that have a GUI -->
<type>application</type>
<target>myapp</target>
<prepare>pre-install.xql</prepare>
<finish>post-install.xql</finish>
<permissions user="admin" password="" group="dba" mode="rw-rw-r--"/>
<!-- this element is automatically added by the deployment tool -->
<deployed>2012-11-28T23:15:39.646+01:00</deployed>
</meta>
Sample Apache Ant Target to Generate an Application .xar file
[edit | edit source]This ant target needs the following inputs:
source-dir - the place you keep your source code package-dir - a temp dir such as /tmp/my-package to store temporary files app-name - the name of your application app-version - the version of your application
- verify that repo.xml and expath-package.xml exist in the source dir and copy them into temp.dir
- copy all application files temp.dir
- create zip file from contents of temp.dir in the packages area and upload it to repositories if needed
<target name="generate-app-xar" description="Generate Application xar archive file">
<echo>Making Package for ${app-name} use source from ${source-dir}</echo>
<zip destfile="${package-dir}/${app-name}-${app-version}.xar">
<fileset dir="${source-dir}">
<include name="**/*.*" />
<exclude name="**/.svn" />
</fileset>
</zip>
<echo>Package is stored at ${package-dir}/${app-name}-${app-version}.xar</echo>
</target>
Sample Apache Ant Target to Generate a Library .xar file
[edit | edit source]This script depends on the following Ant properties:
ant.project.name - the name of the project xslt.dir - the directory that the XSLT script are stored temp.dir - a temp dir such as /tmp to store temporary files web.specs.dir - the place to put the results
<target name="generate-xar" description="Generate xar archive">
<echo>Making ${ant.project.name}.xar...</echo>
<!-- run a transform in the input specification file to create the a.xml file -->
<xslt force="true" style="${xslt.dir}/generate-xar-descriptors.xsl"
in="${web.specs.dir}/${ant.project.name}/${ant.project.name}.xml"
out="${temp.dir}/files/a.xml">
<param name="module-version" expression="${module-version}" />
<param name="eXist-main-class-name" expression="${eXist-main-class-name}" />
</xslt>
<delete file="${temp.dir}/files/a.xml" />
<!-- now create the .xar file with all our files in the right place -->
<zip destfile="${temp.dir}/archives/${ant.project.name}-${module-version}.xar">
<fileset dir="${temp.dir}/files">
<include name="**/*.*" />
<exclude name="*-tests.jar" />
</fileset>
</zip>
</target>
Sample XSLT Script
[edit | edit source]<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" />
<xsl:param name="module-version" />
<xsl:param name="eXist-main-class-name" />
<xsl:template match="/">
<xsl:variable name="module-namespace">
<xsl:copy-of select="//element()[@id = 'module-namespace']" />
</xsl:variable>
<xsl:variable name="module-prefix">
<xsl:copy-of select="//element()[@id = 'module-prefix']" />
</xsl:variable>
<xsl:variable name="spec-title">
<xsl:copy-of select="concat('EXPath ', //element()[local-name() = 'title'])" />
</xsl:variable>
<xsl:variable name="author">
<xsl:copy-of select="//element()[local-name() = 'author'][1]/element()[1]" />
</xsl:variable>
<xsl:result-document href="target/files/expath-pkg.xml">
<package xmlns="http://expath.org/ns/pkg" name="http://expath.org/lib/{$module-prefix}" abbrev="{concat('expath-', $module-prefix)}"
version="{$module-version}" spec="1.0">
<title>
<xsl:value-of select="$spec-title" />
</title>
<dependency processor="http://exist-db.org/" />
</package>
</xsl:result-document>
<xsl:result-document href="target/files/repo.xml">
<meta xmlns="http://exist-db.org/xquery/repo">
<description>
<xsl:value-of select="$spec-title" />
</description>
<author>
<xsl:value-of select="$author" />
</author>
<website />
<status>stable</status>
<license>GNU-LGPL</license>
<copyright>true</copyright>
<type>library</type>
</meta>
</xsl:result-document>
<xsl:result-document href="target/files/exist.xml">
<package xmlns="http://exist-db.org/ns/expath-pkg">
<jar>
<xsl:value-of select="concat('expath-', $module-prefix, '.jar')" />
</jar>
<java>
<namespace>
<xsl:value-of select="$module-namespace" />
</namespace>
<class>
<xsl:value-of select="concat('org.expath.exist.', $eXist-main-class-name)" />
</class>
</java>
</package>
</xsl:result-document>
<xsl:result-document href="target/files/cxan.xml">
<package xmlns="http://cxan.org/ns/package" id="{concat('expath-', $module-prefix, '-exist')}" name="http://expath.org/lib/{$module-prefix}"
version="{$module-version}">
<author id="{$author/element()/@id}">
<xsl:value-of select="$author" />
</author>
<category id="libs">Libraries</category>
<category id="exist">eXist extensions</category>
<tag>
<xsl:value-of select="$module-prefix" />
</tag>
<tag>expath</tag>
<tag>library</tag>
<tag>exist</tag>
</package>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
Sample XQuery Script
[edit | edit source]Acknowledgements
[edit | edit source]The Apache Ant target and the XSLT script were provided by Claudius Teodorescu.
References
References
[edit | edit source]Links
[edit | edit source]Books
[edit | edit source]- Ant: The Definitive Guide, 2nd Edition by Holzner Steve (April 14, 2005)
- Pro Apache Ant by Matthew Moodie (Nov 16, 2005)
- Java Development with Ant by Erik Hatcher and Steve Loughran (Aug 2002)
- Ant Developer's Handbook by Allan Williamson, et al. (Nov 1, 2002)
Articles
[edit | edit source]