What is Metadata?
Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives. The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind. A metadata record consists of a number of pre-defined elements representing specific attributes of a resource, and each element can have one or more values. Below is an example of a simple metadata record:
Element name
Value
Title
Web catalogue
Creator
Dagnija McAuliffe
Publisher
University of Queensland Library
Identifier
http://www.library.uq.edu.au/iad/mainmenu.html
Format
Text/html
Relation
Library Web site
Each metadata schema will usually have the following characteristics:
a limited number of elements
the name of each element
the meaning of each element
Typically, the semantics is descriptive of the contents, location, physical attributes, type (e.g. text or image, map or model) and form (e.g. print copy, electronic file). Key metadata elements supporting access to published documents include the originator of a work, its title, when and where it was published and the subject areas it covers. Where the information is issued in analog form, such as print material, additional metadata is provided to assist in the location of the information, e.g. call numbers used in libraries. The resource community may also define some logical grouping of the elements or leave it to the encoding scheme. For example, Dublin Core may provide the core to which extensions may be added.
Some of the most popular metadata schemas include:
Dublin Core
AACR2 (Anglo-American Cataloging Rules)
GILS (Government Information Locator Service)
EAD (Encoded Archives Description)
IMS (IMS Global Learning Consortium)
AGLS (Australian Government Locator Service)
While the syntax is not strictly part of the metadata schema, the data will be unusable, unless the encoding scheme understands the semantics of the metadata schema. The encoding allows the metadata to be processed by a computer program. Important schemes include:
HTML (Hyper-Text Markup Language)
SGML (Standard Generalised Markup Language)
XML (eXtensible Markup Language)
RDF (Resource Description Framework)
MARC (MAchine Readable Cataloging)
MIME (Multipurpose Internet Mail Extensions)
Metadata may be deployed in a number of ways:
Embedding the metadata in the Web page by the creator or their agent using META tags in the HTML coding of the page
As a separate HTML document linked to the resource it describes
In a database linked to the resource. The records may either have been directly created within the database or extracted from another source, such as Web pages.
The simplest method is for Web page creators to add the metadata as part of creating the page. Creating metadata directly in a database and linking it to the resource, is growing in popularity as an independent activity to the creation of the resources themselves. Increasingly, it is being created by an agent or third party, particularly to develop subject-based gateways.
2. Why use metadata?
The foregoing section has discussed the inadequacy of search engines in locating quality information resources. How does metadata solve the problem? A more formal definition of metadata offers a clue:
Metadata is data associated with objects which relieves their potential users of having full advance knowledge of their existence or characteristics. [DESIRE, p.2]
Information resources must be made visible in a way that allows people to tell whether the resources are likely to be useful to them. This is no less important in the online world, and in particular, the World Wide Web. Metadata is a systematic method for describing resources and thereby improving access to them. If a resource is worth making available, then it is worth describing it with metadata, so as to maximise the ability to locate it.
Metadata provides the essential link between the information creator and the information user.
While the primary aim of metadata is to improve resource discovery, metadata sets are also being developed for other reasons, including:
administrative control
security
personal information
management information
content rating
rights management
preservation
While this document concentrates on resource discovery and retrieval, these additional purposes for metadata should also be kept in mind.
3. Which elements, sub-elements and schemes should I use?
There is no simple answer to this question. At a fundamental level, it becomes a compromise, based on:
the specific needs of the local community to maximise information retrieval and management
the need to guard against making the creation of metadata and its maintenance more trouble than it is worth and therefore defeating its purpose
sustainability of the metadata schema in terms of keeping the records up to date
The bottom-line is that a simple description is better than no description at all, as long as it can aid in the consistent discovery of resources.
The level of specificity in resource description is also important. The resources can be described individually or at a collection or aggregate level. It would be practically impossible to provide guidelines as to the appropriate level of specificity. Cataloguing librarians have been arguing the toss for years without reaching a consensus. As always, we should think in terms of customer needs. As noted above, with the major search engines, it is possible to have too many records, such that our customers can't see the forest for the trees. Initially, it would be sensible to allow the creators to determine which resources deserve their own record. If a collection-level record is used, it is important to add as much information as possible to ensure appropriate retrieval.
Acting on customer feedback is also important. Monitoring the search terms input by customers, is a well proven technique for improving the quality and coverage of a database. The downside is that the assessment process is essentially a manual one.
4. What about using controlled terminology?
Consistent use of language with metadata descriptions can aid in the consistent discovery of resources. The primary tool for ensuring consistent language usage is via controlled vocabulary, including the use of thesauri. A number of metadata elements would benefit from controlled values.
There are many subject thesauri available. However, most are designed for specialist resource communities. For example, the Edinburgh Engineering Virtual Library (EEVL) originally selected the Engineering Information thesaurus, but decided that it was too complex for the purpose. Instead they developed a modified version to suit their specific needs.
Ultimately, as the AGLS Metadata Element Set notes, "… a common sense, author-based approach is still effective and yields a high return to agencies." [AGLS1].
In the absence of a suitable subject thesaurus, some may be tempted to create one from scratch. This temptation is to be resisted at all cost. History is studded with failed attempts at developing new thesauri. Its like establishing a small business. People don't seem to understand that starting is easy, finding the resources to keep the thesaurus current is the real trick. Keeping a thesaurus up to date is a huge investment in resources that is very difficult to justify.
While strictly not a metadata issue, the mismatch between input and index terms has proven to be a major problem in retrieval from databases, particularly as a result of semantic problems, such as different spellings, singular and plural, etc. Although the basic query interfaces for search engines seem similar, there are important differences that affect the outcome of the search. For example, the query 'Mabo Legislation' could be interpreted by different engines as requesting resources that contain:
the words 'Mabo' and 'legislation';
either of the words 'Mabo' or 'legislation';
the expression 'Mabo legislation' as a single unit.
Obviously, these three different interpretations will produce different sets of results. Search engines differ in whether queries are case sensitive and how they handle singular versus plural forms of a word. Alternative spellings, for example, labour and labor, may have to be searched separately. The same applies to abbreviations, such as dept and department. This disconcerts the naive user and annoys the experienced user. One solution is to use a common query interface, or an intermediate query engine which takes a standard query and translates it into the specific forms required by the site search engine.
4. Where will the metadata be stored?
Metadata may be deployed in a number of ways:
Embedding the metadata in the Web page by the creator or their agent using META tags in the HTML coding of the page
As a separate HTML document linked to the resource it describes
In a database linked to the resource. The records may either have been directly created within the database or extracted from another source, such as Web pages.
The simplest method is to ask Web page creators to add the metadata as part of creating the page. To support rapid retrieval, the metadata should be harvested on a regular basis by the site robot. This is currently by far the most popular method for deploying Dublin Core. An increasing range of software is being made available to assist in the addition of metadata to Web pages.
Creating metadata directly in a database and linking it to the resource, is growing in popularity as an independent activity to the creation of the resources themselves. Increasingly, it is being created by an agent or third party, particularly to develop subject-based gateways. The University of Queensland Library is involved in a number of gateway projects, including AVEL and Weblaw.
5. How does one create metadata?
The more easily the metadata can be created and collected at point of creation of a resource or at point of publication, the more efficient the process and the more likely it is to take place. There are many such tools available and the number continues to grow. Such tools can be standalone or part of a package of software, usually with a backend database or repository to store and retrieve the metadata records, Some examples include:
DC-dot - http://www.ukoln.ac.uk/metadata/dcdot/. This service will retrieve a Web page and automatically generate Dublin Core metadata, either as HTML tags or as RDF/XML, suitable for embedding in the section of the page.
DCmeta - http://www.dstc.edu.au/RDU/MetaWeb/generic_tool.html. Developed by Tasmania Online. It is based on SuperNoteTab text-editor and can be customised.
HotMeta - http://www.dstc.edu.au/Research/Projects/hotmeta/. A package of software, including metadata editor, repository and search engine.
Ideally, metadata should be created using a purpose-built tool, with the manual creation of data kept to an absolute minimum. The tool should support:
Inclusion of the syntax in the template (e.g. element name, sub-element, qualifier)
Default content, which can be overridden
Content selected from a list of limited choices (e.g. Function, Type, Format)
Validation of mandatory elements, sub-elements, schemes and element values
http://scholar.google.es/scholar?q=LEARNING+OBJECT+DEFINITION+AND+CHARACTERISTICS&hl=es&um=1&ie=UTF-8&oi=scholart
Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives. The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind. A metadata record consists of a number of pre-defined elements representing specific attributes of a resource, and each element can have one or more values. Below is an example of a simple metadata record:
Element name
Value
Title
Web catalogue
Creator
Dagnija McAuliffe
Publisher
University of Queensland Library
Identifier
http://www.library.uq.edu.au/iad/mainmenu.html
Format
Text/html
Relation
Library Web site
Each metadata schema will usually have the following characteristics:
a limited number of elements
the name of each element
the meaning of each element
Typically, the semantics is descriptive of the contents, location, physical attributes, type (e.g. text or image, map or model) and form (e.g. print copy, electronic file). Key metadata elements supporting access to published documents include the originator of a work, its title, when and where it was published and the subject areas it covers. Where the information is issued in analog form, such as print material, additional metadata is provided to assist in the location of the information, e.g. call numbers used in libraries. The resource community may also define some logical grouping of the elements or leave it to the encoding scheme. For example, Dublin Core may provide the core to which extensions may be added.
Some of the most popular metadata schemas include:
Dublin Core
AACR2 (Anglo-American Cataloging Rules)
GILS (Government Information Locator Service)
EAD (Encoded Archives Description)
IMS (IMS Global Learning Consortium)
AGLS (Australian Government Locator Service)
While the syntax is not strictly part of the metadata schema, the data will be unusable, unless the encoding scheme understands the semantics of the metadata schema. The encoding allows the metadata to be processed by a computer program. Important schemes include:
HTML (Hyper-Text Markup Language)
SGML (Standard Generalised Markup Language)
XML (eXtensible Markup Language)
RDF (Resource Description Framework)
MARC (MAchine Readable Cataloging)
MIME (Multipurpose Internet Mail Extensions)
Metadata may be deployed in a number of ways:
Embedding the metadata in the Web page by the creator or their agent using META tags in the HTML coding of the page
As a separate HTML document linked to the resource it describes
In a database linked to the resource. The records may either have been directly created within the database or extracted from another source, such as Web pages.
The simplest method is for Web page creators to add the metadata as part of creating the page. Creating metadata directly in a database and linking it to the resource, is growing in popularity as an independent activity to the creation of the resources themselves. Increasingly, it is being created by an agent or third party, particularly to develop subject-based gateways.
2. Why use metadata?
The foregoing section has discussed the inadequacy of search engines in locating quality information resources. How does metadata solve the problem? A more formal definition of metadata offers a clue:
Metadata is data associated with objects which relieves their potential users of having full advance knowledge of their existence or characteristics. [DESIRE, p.2]
Information resources must be made visible in a way that allows people to tell whether the resources are likely to be useful to them. This is no less important in the online world, and in particular, the World Wide Web. Metadata is a systematic method for describing resources and thereby improving access to them. If a resource is worth making available, then it is worth describing it with metadata, so as to maximise the ability to locate it.
Metadata provides the essential link between the information creator and the information user.
While the primary aim of metadata is to improve resource discovery, metadata sets are also being developed for other reasons, including:
administrative control
security
personal information
management information
content rating
rights management
preservation
While this document concentrates on resource discovery and retrieval, these additional purposes for metadata should also be kept in mind.
3. Which elements, sub-elements and schemes should I use?
There is no simple answer to this question. At a fundamental level, it becomes a compromise, based on:
the specific needs of the local community to maximise information retrieval and management
the need to guard against making the creation of metadata and its maintenance more trouble than it is worth and therefore defeating its purpose
sustainability of the metadata schema in terms of keeping the records up to date
The bottom-line is that a simple description is better than no description at all, as long as it can aid in the consistent discovery of resources.
The level of specificity in resource description is also important. The resources can be described individually or at a collection or aggregate level. It would be practically impossible to provide guidelines as to the appropriate level of specificity. Cataloguing librarians have been arguing the toss for years without reaching a consensus. As always, we should think in terms of customer needs. As noted above, with the major search engines, it is possible to have too many records, such that our customers can't see the forest for the trees. Initially, it would be sensible to allow the creators to determine which resources deserve their own record. If a collection-level record is used, it is important to add as much information as possible to ensure appropriate retrieval.
Acting on customer feedback is also important. Monitoring the search terms input by customers, is a well proven technique for improving the quality and coverage of a database. The downside is that the assessment process is essentially a manual one.
4. What about using controlled terminology?
Consistent use of language with metadata descriptions can aid in the consistent discovery of resources. The primary tool for ensuring consistent language usage is via controlled vocabulary, including the use of thesauri. A number of metadata elements would benefit from controlled values.
There are many subject thesauri available. However, most are designed for specialist resource communities. For example, the Edinburgh Engineering Virtual Library (EEVL) originally selected the Engineering Information thesaurus, but decided that it was too complex for the purpose. Instead they developed a modified version to suit their specific needs.
Ultimately, as the AGLS Metadata Element Set notes, "… a common sense, author-based approach is still effective and yields a high return to agencies." [AGLS1].
In the absence of a suitable subject thesaurus, some may be tempted to create one from scratch. This temptation is to be resisted at all cost. History is studded with failed attempts at developing new thesauri. Its like establishing a small business. People don't seem to understand that starting is easy, finding the resources to keep the thesaurus current is the real trick. Keeping a thesaurus up to date is a huge investment in resources that is very difficult to justify.
While strictly not a metadata issue, the mismatch between input and index terms has proven to be a major problem in retrieval from databases, particularly as a result of semantic problems, such as different spellings, singular and plural, etc. Although the basic query interfaces for search engines seem similar, there are important differences that affect the outcome of the search. For example, the query 'Mabo Legislation' could be interpreted by different engines as requesting resources that contain:
the words 'Mabo' and 'legislation';
either of the words 'Mabo' or 'legislation';
the expression 'Mabo legislation' as a single unit.
Obviously, these three different interpretations will produce different sets of results. Search engines differ in whether queries are case sensitive and how they handle singular versus plural forms of a word. Alternative spellings, for example, labour and labor, may have to be searched separately. The same applies to abbreviations, such as dept and department. This disconcerts the naive user and annoys the experienced user. One solution is to use a common query interface, or an intermediate query engine which takes a standard query and translates it into the specific forms required by the site search engine.
4. Where will the metadata be stored?
Metadata may be deployed in a number of ways:
Embedding the metadata in the Web page by the creator or their agent using META tags in the HTML coding of the page
As a separate HTML document linked to the resource it describes
In a database linked to the resource. The records may either have been directly created within the database or extracted from another source, such as Web pages.
The simplest method is to ask Web page creators to add the metadata as part of creating the page. To support rapid retrieval, the metadata should be harvested on a regular basis by the site robot. This is currently by far the most popular method for deploying Dublin Core. An increasing range of software is being made available to assist in the addition of metadata to Web pages.
Creating metadata directly in a database and linking it to the resource, is growing in popularity as an independent activity to the creation of the resources themselves. Increasingly, it is being created by an agent or third party, particularly to develop subject-based gateways. The University of Queensland Library is involved in a number of gateway projects, including AVEL and Weblaw.
5. How does one create metadata?
The more easily the metadata can be created and collected at point of creation of a resource or at point of publication, the more efficient the process and the more likely it is to take place. There are many such tools available and the number continues to grow. Such tools can be standalone or part of a package of software, usually with a backend database or repository to store and retrieve the metadata records, Some examples include:
DC-dot - http://www.ukoln.ac.uk/metadata/dcdot/. This service will retrieve a Web page and automatically generate Dublin Core metadata, either as HTML tags or as RDF/XML, suitable for embedding in the section of the page.
DCmeta - http://www.dstc.edu.au/RDU/MetaWeb/generic_tool.html. Developed by Tasmania Online. It is based on SuperNoteTab text-editor and can be customised.
HotMeta - http://www.dstc.edu.au/Research/Projects/hotmeta/. A package of software, including metadata editor, repository and search engine.
Ideally, metadata should be created using a purpose-built tool, with the manual creation of data kept to an absolute minimum. The tool should support:
Inclusion of the syntax in the template (e.g. element name, sub-element, qualifier)
Default content, which can be overridden
Content selected from a list of limited choices (e.g. Function, Type, Format)
Validation of mandatory elements, sub-elements, schemes and element values
http://scholar.google.es/scholar?q=LEARNING+OBJECT+DEFINITION+AND+CHARACTERISTICS&hl=es&um=1&ie=UTF-8&oi=scholart
No hay comentarios:
Publicar un comentario