Friday, November 28, 2008

Deep Linking to a section of a Youtube Video

Here's a clever little trick to pass on to students and staff who want to show a section of a YouTube video without streaming the entire thing.

Just append the to URL #t=XmYs where X is how many minutes into the video and Y is how many seconds. These values are displayed at the bottom right of the YouTube video player's toolbar if you want to get exact timings.

For example:
http://www.youtube.com/watch?v=vahx4rAd0N0#t=1m5s

Starts the clip at the 1 minute 5 second mark (actually it starts about 2 seconds earlier in practice). Note in the screen shot that the red line that tracks the progress of the stream doesn't start from the beginning of the video but from your nominated start point.



Saves you the embarrassment of a slow download and irrelevant lead material.

More from the YouTube blog

Thursday, November 27, 2008

Cloud Computing

Had a chance to see Kent Adams' (Director IT&R) dry run of his presentation on cloud computing on Wednesday.

Read the Wikipedia entry on cloud computing

It's also referred to as SaaS (Software as a Service), I'm sure it used to be called Application Service Provision (ASP) and before that 'thin client computing' (and even before that mainframe/dumb terminal) but I'm showing my age.

Basically a few players (notably Microsoft and Google) are offering to host services (from email to the Office suite) at prices significantly less than we can provide them for and arguably with a lot more utility. They do this by the sheer economies of scale and a massively distributed network of datacentres/servers. If ITR did move to that model of service provision they would remove themselves from the Sisyphean cycle of hardware and network upgrades, backup and maintenance tasks, the impossibility of meeting increasing user expectations, and a significant user support burden.

Potential downsides include:
  • our internet connection becomes crucial in IT service delivery
  • that the price today may not be the price tomorrow (Kent quoted Scott McNealy's take "the first heroin fix is free")
  • the loss of control particularly over security and privacy
The pluses include:
  • tapping into a the resources of these giants (Kent was clearly impressed that Google had 350 software engineers IN AUSTRALIA ALONE - so am I)
  • proven reliability - can you remember Google being down?
  • having access the constant improvements and additional products that are developed on behalf of all customers
  • not having to deal with the I: drive, students having gigabytes of storage they can access the same way from anywhere
I wrote an issues paper (in response to our I: drive woes) about this much earlier in the year for Heather and Kent and came across this quote from Kari Barlow, Assistant Vice President, University Technology Office, Arizona State University along the lines of ‘Internet services are no longer a cottage industry, not every institution has to build their own from scratch anymore’. ASU have partnered with Google to provide their students with email accounts.

Kent noted that we already are using this model for some services, SpendVision and Serials Solutions are examples.

Kent wasn't presenting it as a fait accompli but it was certainly worthy of consideration. Very cool to see our IT people take the possibilities seriously.

Bought the T-Shirt? See the movie. A 6 minute intro to cloud computing - clear and simple:

Monday, November 3, 2008

Drowning in the Possibilities


I've been prepping for the Professional Development Day (Library 2.0) in Townsville in November and for the Library Planning Days (Rethinking the Virtual Library) the week after and all the reading is making my head feel like a glass of dirty water - I'm just waiting for the sediment to settle.

If I was a tag cloud the big words would be:
usability, information architecture, EBL, and user-centric design.

The CMS project rumbles along in the background and the ninjas are currently working on removing references to pages on the old site. There are still publishing issues which are proving difficult to track down. Remember that my monthly reports are on the Intranet as are all the managers' reports and the management committee minutes.

What I've Been Reading

Google Reaches Settlement with Publishers on Google Book Search
"Three years ago, the Authors Guild, the Association of American Publishers and a handful of authors and publishers filed a class action lawsuit against Google Book Search.

Today we're delighted to announce that we've settled that lawsuit and will be working closely with these industry partners to bring even more of the world's books online. Together we'll accomplish far more than any of us could have individually, to the enduring benefit of authors, publishers, researchers and readers alike.

It will take some time for this agreement to be approved and finalized by the Court. For now, here's a peek at the changes we hope you'll soon see."

Of course there is no indication what this means for the theworld outside United States borders. Nor do I see how the plaintiffs can make an agreement on behalf of publishers and authors who are not domiciled or citizens of the US.

What if Google did go broke? Where would all that scanned data go? The answer is Hathi.


Jarvis, Jeff. Let's junk the myths and celebrate what we've got. The Guardian, September 29, 2008.
"It never fails. I'll be talking with a group about the amazing opportunities of the internet age and inevitably someone will pipe up and say, 'Yes, but there are inaccuracies on the internet.' And: 'There are no standards there.' ...There the conversation stalls....Once and for all, I'd like to respond to these fears and complaints."

"Reinforcing its place in the scientific community, the arXiv repository at Cornell University Library reached a new milestone in October 2008: Half a million e-print postings -- research articles published online -- now reside in arXiv, which is free and available to the public."
http://arxiv.org/

Bibliographic Software Wars? EndNote vs Zotero/Thomson Reuters vs George Mason University Proprietary data formats in an OpenSource world

Nature reports on the $10 million lawsuit Thomson Reuters (makers of Endnote) have filed against George Mason University (GMU), the birthplace of Zotero (the Firefox plugin that "allows researchers to share their digital information, iTunes style, whether it is in the form of ciations, documents or web pages.

The article discusses the case and the wider implication it has - what if OpenOffice can no longer save or open documents stored in Microsoft's proprietary format?

The ECAR study of undergraduate students and information technology, 2008
This 2008 ECAR research study is a longitudinal extension of the 2004, 2005, 2006, and 2007 ECAR studies of students and information technology. The study is based on quantitative data from a spring 2008 survey of 27,317 freshmen and seniors at 90 four-year institutions and eight two-year institutions; student focus groups that included input from 75 students at four institutions; and analysis of qualitative data from 5,877 written responses to open-ended questions. In addition to studying student ownership, experience, behaviors, preferences, and skills with respect to information technologies, the 2008 study also includes a special focus on student participation in social networking sites.
Released in time for the Educause meeting, I'm very interested to hear what Heather has to report back - hopefully we'll get a taster at the Professional Development Day.

Express printer solves problem of out-of-print textbooks
Kate Elder passed this one on - but what an eminently cool idea. Books printed at point of need, no overruns being pulped by the pallet load. No global shipments of books by freight, reducing the publishing industries carbon footprint.

No Brief Candle: Reconceiving Research Libraries for the 21st Century
PDF free, print version available for a fee.

How should we be rethinking the research library in a swiftly changing information landscape?

In February 2008, CLIR convened 25 leading librarians, publishers, faculty members, and information technology specialists to consider this question. Participants discussed the challenges and opportunities that libraries are likely to face in the next five to ten years, and how changes in scholarly communication will affect the future library. Essays by eight of the participants—Paul Courant, Andrew Dillon, Rick Luce, Stephen Nichols, Daphnée Rentfrow, Abby Smith, Kate Wittenberg, and Lee Zia—were circulated to participants in advance and provided background for the conversation. This report contains these background essays as well as a summary of the meeting.


Thursday, October 2, 2008

Google: back then, in SFX statistics now, and in federated searching in the future

Ghost of Google's past

First some fun: Google in celebrating its 10th birthday has released its oldest available index (January 2001) http://www.google.com/search2001.html - of course there's a ton of broken links, but the there are alternative links to the content through the Internet Archive.

I did the obligatory vanity search and found the first thing I ever marked up in html (using VI back when it was really ugly). The I tried "twin towers" 911 and got some vacation apartments and one eerily prescient entry from Google Directories:

Business Contingency - http://www.BusinessContingency.com
Few businesses survive an interruption that lasts for more than 10 days. Two thirds of the businesses in the NYC twin towers did not recover. Will you?

Google in our SFX stats

September was something of a red letter month for Google Scholar. For the first time it became the biggest source of SFX requests, and simultaneously for the first time became the biggest source of SFX requests from X Search (Metalib) after the Expanded Academic Index.

The rise of Google Scholar tells us something about our users, and their desire for a simplified search. We have never 'championed' Google Scholar, although I know some liaison librarians will show it to students. Our only acknowledgement that it exists is a oneliner in a relatively 'deep' page, and, I think, the instructions for accessing SFX in Scholar from off campus are in an externally hosted blog that you can't find using our search engine. In spite of all this it is the most popular route clients have to our esubscription content.

Future of Google Scholar in Federated Searching

Serials Solutions announced last week that Google Scholar was no longer available for federated searching through 360 Link because it's not allowed in the Terms Of Use. SS founder Peter McCracken has blogged the change and its implications that makes for interesting reading. The support site was succinct:
Google's Terms of Use state that any federated search engine, such as 360 Search or WebFeat, is not allowed to display results from Google properties. In order to satisfy Google's terms, Serials Solutions will be terminating any connections to Google content in both 360 Search and WebFeat, effective immediately. If you would still like to include Google in your federated search interface, please send a request to Support to have it added as a "link-only resource" -- meaning that there will still be a link to the Google native search in your interface, but Google results will no longer be included in the federated search results.

For more information about Google's Terms of Use please visit: http://www.google.com/accounts/TOS. The specifics can be found in section 5.3.

Thursday, September 11, 2008

Touching base, what I am doing

Apologies for my slackness in writing lately. I am quite literally drowning in the CMS conversion. I'd like to thank the Ninjas and especially Sharon Bryan for their work knocking off the last pointy edges before a 'real' trial publish.

One thing I think all the ninjas agree on is that we really need to do a review of our site. Because the site has had bits tacked on in an 'as needs' basis we are finding lots of content that works on its own but not as a coordinated part of the library site. There is massive duplication, particularly of contact information. There also a lot of broken internal links as things like the new book lists have been moved but older pages links to them have slipped through the cracks - and to add to the embarrassment those broken links seem to have been on display for literally years.

Again due to the way the site has evolved over time there is little consistency of 'voice' or format (every form looks different).

I think the two big areas for us to focus on post launch are:
  1. A review (with a lot of observational user studies) and a redesign with a view to making the site user-centric based on what we learn from the review
  2. A commitment to quality assurance, making the existing content meet standards for accessibility, usability, voice, granularity, consistency, currency, relevance; and building mechanisms (both automated and organisational) to ensure new content meets those standards.
We also have a massive link checking job ahead of us in the subject guides, and I think it's time we started thinking about how we approach resource discovery in our subject disciplines.

I also expect the results of the Client Survey to feed into the areas identified as most important to our clients.

A little factoid I calculated is that 100 of our 6000+ pages generate 92% of our hits - perhaps our long tail is a little too long?

I'm also doing the preliminary design of our 360 Search/Link implementation, helping out with the Library Planning Day committee, and organising the impending Horizon upgrade (3rd of October). If I die I like dark red roses and donations to Amnesty International and Medecins Sans Frontieres.

Tuesday, August 5, 2008

The Virtual Library - where to next?

I was going to post about SirsiDynix releasing their Enterprise V1.0 product. I've read the media release and perused the web site at http://www.sirsidynix.com/Solutions/Products/portalsearch.php and I'm still not sure I see anything to get overly excited about. Enterprise is a layer that sits on top of the OPAC (in our case HIP) which provides a few bells & whistles, like faceted results analysis, profiles for specific user groups, some fuzzy searching logic, and a little web2ish content integration (cover images, for example).

It might be attractive to a library with a number of discrete collections, or a consortium, but it seems tied to physical collections and increasingly our collection development revolves around the virtual (ie electronic/digital).

I feel like we can probably stick with our current ILMS for two years before we'd enter a review phase about what we do next. Horizon is now a dead end, if stable, product (after the 7.4.1 upgrade in September). It will continue to be the chief management tool of our physical collections from acquisition to circulation at least until that review.

I think we need to step back and think about how all our resources can best be delivered to our clients and look for tools that allow us to do that, rather than acquiring systems and then trying to figure out how to make them do what we want.

In the last client survey the one area where we lost ground, admittedly not much, was in the 'virtual library' section. Personally I didn't find this a surprise even though I think the resources we provide are better than we have ever provided before. I believe the rapid acquisition of resources and entry points to those resources (think X Search, LearnJCU, Reserve Online, 30000+ ejournals subs, 300+ I&A/FT databases, numerous guides, VISA, LearningFast, remote access, library policies, rules and regs) has swamped an information architecture firmly rooted in a much less virtual information world.

It is time to seriously look at our approach, both in philosophy and technology. I believe we need a more client-centred and client/context-centred approach. I often ponder why we silo-off library materials and services from the rest of the students learning experience. Are we not a key part of the process that creates the perfect graduate? Why aren't our services seamlessly integrated with teaching materials, at the point where they are most relevant. For example why does a student who has logged into LearnJCU and selected a particular subject have to login again to Reserve Online and then enter the subject code to see the readings for the subject? We already know who they are and what subject they're doing. Why aren't the reading lists embedded in the course materials with links directly to the item's full text? I think we should be asking these questions.

Friday, August 1, 2008

Link Resolver statistics and Collection Building : one small step?

I've just been perusing the monthly SFX statistics that I've set up as automatic monthly emails and decided to actually try doing something with the stats rather than just record them. Great job for a Friday.

The particular stats I played with today are the 'Books accessed via SFX ranked by use'. These stats indicate which books were returned in searches (both within Metalib and in the database UIs we've embedded the SFX service) which in turn attracted a client's attention enough to click on 'Find It'. The stats show how many times this this happened for each ISBN.

I reasoned that this data could be a useful indicator of both the titles and subject areas clients are searching for which in turn would aid acquisitions decisions by:
  • Identifying high demand titles not in the collection
  • Highlighting subject areas under represented in the collection
  • Giving liaison librarians an insight into the information needs of their clients
What the report doesn't show is whether we hold the item in any format, so a manual check of the catalogue is required. The only identifying metadata for the title available is ISBN.

I decided to check all the ISBNs that had more than two requests and if they were not part of the JCU collection list the bibliographic details with a link to more information (either from a publisher, vendor or Google Books) which would further aid the decision to purchase or not.

Two side points:
  • About 40% of the requests had no ISBN, making them impossible trace just using the report
  • The clickthrough rates for books are about half what we get for journal articles
It's just a trial to gauge its perceived utility with the liaison librarians and a bunch of improvements could be made. For example, I'd like to either break it into faculties/schools.

I'm about to send it to the liaison librarians to see what they think. Maybe they'll even comment in this blog

Thursday, July 24, 2008

Open Source Integrated Library Management Systems

I necessarily subscribe to a bunch of lists and feeds. The flow can be overwhelming but if you get into a simple scanning routine you notice patterns in the ebb and flow of the communal mind that is the web. In addition I actually talk to you about issues and ideas. For this post I thought I just mention some of the things that have caught my eye/mind recently.

Open Source Integrated Library Management Systems are a key watch topic with QULOC-ICT. The two main systems are Evergreen and Koha the general feeling is that these systems don't have the full range of modules the commercial providers have but they do work. On web4lib there has been a lot discussion not about the software itself but hiring third parties to maintain and develop it.

The software is free but support is not. But, unlike the pricing model the commercial providers use (an annual license/maintenance fee that has no connection with what you actually get, and additional costs for customised development) you negotiate with the third party when you want something done or get quotes from a number of third parties.

Sidebar: Talk to an ex-sales person from one the big ILMS providers and to find some truly bizarre formulas for calculating costs (which are usually manipulated so that a pre-decided on price is the answer): number of campuses, students, staff, books and multiply, divide, add, subtract, sine, cosine, tangent them in an order of their choosing.

Increasingly libraries are dipping their toes in the Open Source Software (OSS) water by using it as an additional layer or view to existing data stores, for example the NLA use of VuFind as an alternative OPAC, Charles Darwin University is experimenting with it as well.

A number of libraries have announced an actual dive into the water. In the last month the following libraries have gone KOHA:
And these libraries have gone with Evergreen recently:
As more libraries move to open source opportunities open up for competitors for Equinox (Evergreen), LibLime (Koha) and Media Flex (OPALS).

We are currently planning to upgrade Horizon to 7.4.1 in the mid-semester break, more information as we approach that period.

Further reading

The JISC review: Library Management Systems: Investing wisely in a period of disruptive change
Library Journal article: Automation System Marketplace 2008: Opportunity Out of Turmoil

Thursday, July 10, 2008

Open Source competitor for Dewey

Tim Spalding of LibraryThing has made a call for volunteers to create an open source alternative for the Dewey Decimal Classification system. It seems Tim's ultimate aim is to have the LibraryThing 'collection' classified using this new system, apparently to allow users to 'browse' virtually.

The comments on the blog veer from the derogatory to heart-on-sleeve enthusiasm.

You can follow the discussions in the LibraryThing group: Build the Open Shelves Classification. So far the discussion seems to be struggling to get out of a DDC/LCC mindset, though some interesting ideas have been put forward. Personally I find that accommodating a one dimensional ordering (ie shelf ordering) is hampering the discussion and I think a more three dimensional approach would be more useful in the online context. It makes me think of faceted classification systems that I barely recall from an obscure part of my library degree a couple of decades ago.

What I find interesting is that LibraryThing senses a need for more structured than the folksonomy approach that is its signature. And in the same week that Web4Lib started talking about ChaCha, a search engine that combines human mediation with with search engines to give users 'answers rather than a list of links'. Does this signify a shift in the zeitgeist about the costs and benefits of human vs machine processing of information? Probably not. But it's a timely reminder that a trained human brain and structured information resources aren't yet anachronisms.

Tuesday, July 8, 2008

Academic Live and Live Books fade away

Go away for a couple of months and everything changes.

On the 23rd of May 2008 Microsoft announced that it was pulling the plug on Live Search Academic and Live Search Books, two products it launched as direct competition for Google Scholar and Google Books respectively.

Though never as widely used as their Google counterparts they were useful products and it is sad to see the work done in digitising 750,000 books and indexing 80,000,000 journal articles disappear from the information retrieval landscape.

In the announcement Satya Nadella, Senior vice president search, portal and advertising at Microsoft states that it is a business decision as they have decided to "focus on verticals with high commercial intent, such as travel, and offer users cash back on their purchases from our advertisers". The announcement states that the company does not see a sustainable business model for the services in the current environment.

The existing indexes will be merged into the Live Search engine http://search.live.com/

Friday, July 4, 2008

Link Resolver stats overview for first half of 2008

Just a brief glimpse of the usage statistics for SFX year-to-date. See the full reports on the Library Intranet (JustUs), password required.

Requests 207247 (number of clicks on 'Find It' button) - up 15% on 2007

Click Throughs 161891 (the number of times Link Resolver either found an e-holding OR user clicked on the 'Search Catalogue' link when no e-holding found) - up 11% on 2007

The databases providing the most link resolution requests are:
  • CINAHL
  • PsycInfo
  • Google Scholar
  • Web of Knowledge/Web of Science
  • Medline
In addition to ejournal holdings our link resolver initiated 23850 catalogue searches and delivered 2974 ebooks

The top epublishers were:
  • Elsevier
  • Gale
  • Proquest
  • Blackwell
  • Free Ejournals
  • Informaworld
  • Springer
  • Wiley
  • Ovid
  • Sage
The five most popular journals were:
  • Marine ecology progress series
  • The Australian journal of rural health
  • Aquaculture
  • Science
  • Marine biology
Approximately 70% of requests resulted in a fulltext service

Wiley InterScience swallowing Blackwell Journals Online (Synergy)

The Wiley merge of Blackwell Journals hasn't been as smooth as hoped. The database search appears be working and X Search returns the same number of hits for both Wiley and Synergy, and the A-Z list of databases both point to the same page (Wiley have arranged for all traffic to the old Synergy address to go to their site but I'm not sure if this due to a redirect or a DNS change).

The issues we are having are with our link resolver (Find It). There has been a lot of discussion the SFX listserv about users accessing the Synergy titles being sent to either the Wiley home page or to the journal's home page on Wiley.

Below is some information on the problem, it's ETA for resolution, and some stats on how the Wiley servers is coping with the additional journals (JCU had around 900 Blackwell titles and Synergy was a second most accessed ejournal publisher in the first five months of 2008 (Wiley is seventh))

Dear all - Here is an update on some of the issues raised today about the Wiley InterScience transition:

Downtime - The site went down and was unavailable for a short time this afternoon and last night, UK time. We have now identified the same cause of both these periods of downtime and are therefore able to fix it to prevent it occurring for this reason again.

Access to subscribed journals - if you find that your access to particular titles is not set up, such as for non-Collection titles, then please submit your request via our Customer Services site as I suggested:
http://www.interscience.wiley.com/support. This serves two purposes; 1) it will get your problem fixed and your access set up as you need it, and 2) it will enable our engineers to judge whether there is truly a pattern in the types of requests we're getting and therefore will help them to find a generic fix if there is.

OpenURL linking - we are working on this so that OpenURL / SFX links to Blackwell Synergy pages are redirected to the nearest equivalent page on Wiley InterScience rather than the homepage as is happening now. The fix may take a few days so apologies for this. Links to the Blackwell Synergy URLs which are constructed like so
http://www.blackwell-synergy.com/loi/jan are working.

Athens - we are working to resolve this so that authentication works for all journals. It currently seems to work for some journals and not others.

Google Scholar - Lesley Crawshaw pointed out that yesterday Google Scholar search results weren't resolving to articles on Wiley InterScience for Blackwell journals. This was because yesterday Google chose not to search the site. We are checking that they are now indexing the site as promised.

DOIs - Colin MacLean raised the issue of DOI links not resolving to the appropriate article. We think this is because the article hasn't yet been loaded onto Wiley InterScience. As I mentioned, there is still some content which we are working through to upload and this can be found on our transition site:
http://www.interscience.wiley.com/transition

Site activity - Just for your information I wanted to give you some feedback about initial activity on Wiley InterScience now that we've had one day with the added 1.6 million Blackwell articles. Since yesterday the response times have improved with 99% of pages being served in less than 2 seconds, usage is up 77% over the same period last week, and content delivered to customers is up 60% over last week, alongside the 43% increase in content on the site. Once we have resolved these final access, linking and content issues, we will of course expect this to rise still further.

If there are any other errors you'd like to report then please submit them via our Customer Services site as that way we can monitor them, report on them, fix them, and respond directly to you about them.

Wednesday, April 2, 2008

What does a library technologies coordinator do?

Mostly I just panic.

Seriously though - this is a new position at JCU Library and although there are some clear goals it's also clear that the job, like the technologies themselves, will evolve with the library's requirements.

My current major projects are listed (and occasionally described) on the JCU Library Intranet (JustUs) in the technology sub site I'm developing.

Away from specific projects I see my role as:
  • liaising between the Library and Information Technology & Resources
  • a sounding board for staff (within and without the library) on issues involving the intersection of IT and IM
  • helping the library pick tech 'winners'
  • sharing and building knowledge, and helping and encouraging library staff to develop their IT skills
  • trying to get a handle on what's happening in library technology globally, nationally and locally and determining best fits
  • coordinating projects that contribute to the Library's strategic goals
  • identifying opportunities for the library to contribute to the universities strategic goals
The challenge is the volume of possibilities weighed against the limited resources we have to investigate or implement them. As a small institution early adoption of new technologies is a risk attractor for us. But it's a fine line between caution and inaction. And as Heather Gordon has stated - we can't keep providing new services without addressing what services we no longer need to provide.

My other challenge is communicating with library staff. I worry that a constant stream of emails make you invisible, but blogs and intranets are too passive. It's up to me to build relationships where our staff are comfortable seeking information and assistance as well as mechanisms where staff know what's happening with library tech, when they need to know it. Suggestions and thoughts most welcome.

Monday, March 3, 2008

Google and CrossRef

Somehow stumbled across this following up something completely different: the Google CrossRef Pilot. Back in 2004 45 leading journal publishers who were CrossRef members signed up with Google for the CrossRef Search Pilot.

Each publisher negotiated arrangements with Google for which parts of their sites would be indexed. Each publisher has a search page (hidden somewhere on their site) which searches all the indexed content of the pilot's participants. Effectively it's a mini Google Scholar, except that with publishers like Blackwell, Cambridge & Oxford, Biomed Central, Springer, Karger, Taylor & Francis, Thieme and Wiley it's hardly 'mini'.

Ed Pentz from CrossRef contacted me to say the project was 'on hold' but that Google was still indexing the 45 publishers so it is still up to date. Searching from on campus means you will be able to view full text of items retrieved where we have subscription access based on IP restriction (haven't tested it remotely).
Try it:


or if that doesn't work try it from Nature Online.

You can still see the original press release.

It's unclear what Google's future intentions are - but unlike Google Scholar at least the Pilot gives the researcher some indication of what's being searched. You can make your own Google search a CrossRef search by adding 'restrict=crossref' to the search url, e.g.

Turn
http://www.google.com/search?q=ulysses+butterfly

into

http://www.google.com/search?q=ulysses+butterfly&restrict=crossref

Wednesday, February 20, 2008

Google Scholar and commercial publishers

We're currently reviewing our options for federated searching and link resolution services. We've opted to identify possible (within resource constraints) scenarios. One possible scenario is to opt for Google Scholar as the federated search tool (Some universities have gone down this path e.g. University of Pretoria).

Arguably, if we knew which providers of full text Google Scholar crawled we could use it as a federated search tool and let our institutional subscription provide access to content (via IP address restriction).
There's the rub, though. Google are remarkably tight-lipped about what and who they are indexing. It's not clear if that's anything more than apathy.

As background for our review I asked the web4lib list if anyone had seen or built a canonical list. This generated some discussion about Google recalcitrance. Bill Drew wondered if anyone had actually asked Google for a list of what was indexed. Roy Tennant confirmed that he had asked Anurag Acharya (Google Scholar's lead engineer) that question directly and 'got nowhere'. Corey Murata confirmed that and provided a link to Google's Librarian Central's transcript of a Tracey Hughes interview with Acharya:
TH: Why don't you provide a list of journals and/or publishers included in Google Scholar? Without such information, it's hard for librarians to provide guidance to users about how or when to use Google Scholar.
AA: Since we automatically extract citations from articles, we cover a wide range of journals and publishers, including even articles that are not yet online. While this approach allows us to include popular articles from all sources, it makes it difficult to create a succinct description of coverage. For example, while we include Einstein's articles from 1905 (the “miracle year” in which he published seminal articles on special relativity, matter and energy equivalence, Brownian motion and the photoelectric effect), we don't yet include all articles published in that year.

That said, I’m not quite sure that a coverage description, if available, would help provide guidance about how or when to use Google Scholar. In general, this is hard to do when considering large search indices with broad coverage. For example, the notes and comparisons I have seen about other large scholarly search indices (for which detailed coverage information is already available) provide little guidance about when to use each of them, and instead recommend searching all of them.
Will Kurt suggested that we could create our own wiki list of publishers - if someone could set it up ... and then realised he could, through his lib-bling.com site:

http://lib-bling.com/scholar/index.php?GoogleScholar


Tuesday, February 19, 2008

A new approach to web resource discovery

At JCU we've had static lists of subject-based web resources since the dawn of 'before my time'. This approach evolved directly from the 'Pathfinder' model I first saw as an undergrad circa 1989. A paper list of in-building (mostly) paper resources.

Now we have a electronic list of electronic resources with almost standard groupings like 'Databases', 'Ejournals', 'Associations & Organisations', 'General', 'Specific' etc. Over the years individual guides have mutated from the original template based on the nature of the subject and the preferences of the author.

These tools provide a menu of resources for the 'diner' to peruse over a leisurely lunch, rather than providing a drive through window for the student in a hurry. The choice to browse rather than search is often a product of need and time.


Browsing aids indepth knowledge (and often requires it).

Searching often satisfies an immediate need and requires less subject knowledge (particularly in assignments with set topics).

Can we provide one tool to support both needs?


The database section subject guides can also be an administrative burden. Many cite the same cross disciplinary databases, so when a name or IP address changes the edit has to be replicated in multiple files. Currently we store this information at least two other places:

  1. The catalogue, which in turn generates the static A-Z listing on the web site
  2. In X Search (the JCU implementation of Ex Libris' Metalib).
Conceivably it should also be stored in our ERM as well, although I'm told it currently isn't. It seems obvious that reducing data maintenance by having a central store and 'pulling' a list of relevant databases out of it dynamically and embedding them in the resource guide is preferable to maintaining multiple lists. And why not embed a search form in the subject guide that used federated searching to search those databases?

And if you are happy with that model can we transfer it to the other eresources currently listed in the resource guide pages? Could we create or use an existing database to manage/store web sites and draw on them to populate resource listings?

Well of course we could. To see how it might look take a look at the PHP/MySQL application
PirateSource developed by the Joyner Library at East Carolina University, also used by Curtin University of Technology.

What's missing from PirateSource is the ability to search the resources listed as a job lot. Which leads me to the next bit of this spiel: Google Custom Search. With GCS you can tell google exactly which sites you want hits returned from, in effect an expansion of using Google to search one site using the 'site:xxxx.edu.au' restriction.

As an experiment I've created a GCS that restricts to the websites listed on our Accounting & Finance Guide (does not include the databases, ejournals or ebooks listed, only the web sites in the last four categories),
take it for a spin. The results can also be 'iframed' inside an institutional page - which I haven't done at the time of writing, but may have done by the time of reading.

Of course we are then back to maintaining separate lists of web resources, aren't we? Not necessarily. If we could store all those web sites in ERM, with enough metadata to retrieve them, and if the Serials Solutions API is up to it, we could have one central database of resources that could populate subject guides dynamically with appropriate resources, and we could even have an option to search the retrieved resources simultaneously.

Except searching databases, ejournals and ebooks would be one federated search and all other web resources would be another federated search (Google Custom Search). The multitude of subject specific GCSs would have to maintained semi manually - a cut and paste from of the selectedURLs (one to a line) into the GCS 'Sites to search' box.

I propose all this as a talking point sparked by Helen Hooper showing me Curtin's subject guides. If you are interested in learning more please let me know.

Monday, February 18, 2008

VALA 2008 Report Back: Repositories, research and reporting: the conflict between institutional and disciplinary needs - Danny Kingsley

Original Paper
Danny reported some of the findings she'd made in researching for her Dissertation on barriers to academics use of institutional repositories.
Apparently across the world repository has stagnated at around 15% of all academic output.
This issue became a recurring theme at VALA (with many carrots and sticks being hurled around) but Danny's paper offered a fresh insight because
  1. She isn't a librarian
  2. The information was largely based on one-on-one interviews with academics so, in a sense it's from the horse's mouth - I do worry we don't spend enough time in the stable (we'd probably scare the horses anyway).

She grouped her academics by discipline which highlighted how important it is to know the needs of the groups you're working with. She interviewed a fairly large sample of academics from three disciplines (Chemistry, Sociology and Computer Science) about their information seeking behaviours with a view to how digital repositories fitted with those behaviours.


































Information Seeking Behaviours by Discipline


Chemistry

Sociology

Computer science

Main sources of information and publication target

Journals

Journals & monographs

Conference papers

Keeping tabs on developments in the field

Systematic approach (TOCs of key journals)

Specific conferences

Serendipity

Researching new topic

Use databases less general searches (SciFinder) embarrassed by using Google

snowball mixture of text and web, following footnotes, browsing

almost exclusively use Google "can't live without it"

Researchers working in the same sub-discipline

The number of people in my absolute finite area is in the 10’s. In the general area it is in the 1000’s. I keep an eye on about 20 people and there is 10-15

with a broader interest I keep an eye on.

It’s a very small pool in Australia. There are only 5-6 people at the top.

I know most of the people active in my field, they send me their work. About

12-20 people.

Danny discussed the barriers to academic use of repositories and how they might be overcome. Some were simple usability problems like 'how easy is it to deposit something?' and 'is it easy to find the repository?'. Others were more complex like balancing institutional reporting requirements with the academics greater 'loyalty' to a research community than to an institution.

Even more insidious like the American Chemical Society's practice of refusing to publish an item pre-published in a digital repository. She gave an example of an institution finding ways around this sort of barrier in QUT's approach of having links from their repository to RePEc so that the institutional repository doesn't dilute the hits on RePEc which are an important signifier of reputation in the field of Economics.

The overall message was that academics in different fields have different needs and to attract them to using the institutional repository you have to:

  1. Understand their needs faculty by faculty
  2. Offer them something better than what they already have (say the ability to link or embed a dynamically created publications list or download counts)
The problem of getting academics to use institutional repositories was revisited numerous times during the conference. With the RQF 'stick' Danny's call for more 'carrot' was timely.

Wednesday, February 13, 2008

VALA 2008 Report Back: Repositories thru the looking glass - Andy Powell

There are many methods for predicting the future. For example, you can read horoscopes, tea leaves, tarot cards, or crystal balls. Collectively, these methods are known as "nutty methods." Or you can put well-researched facts into sophisticated computer models, more commonly referred to as "a complete waste of time." Scott Adams

Andy has a long history with Eduserv and was the principal technical architect of the JISC Information Environment. He has been active in the Dublin Core Metadata Initiative for a number of years. Andy jointly authored the DCMI Abstract Model and several other Dublin Core technical specifications. More recently he jointly authored the DC Eprints Application Profile for the JISC. He was also a member of the Open Archives Initiative technical committee.

With a background like that it was surprising that he opened his talk by saying he thought we'd gone down the wrong path with institutional repositories (he pre-disclaimed that these were things he was pondering lately, and were not the thoughts of his employers).

His key ideas were:

  • Repositories have largely ignored the web
  • Too much focus on the word ‘repository’ rather than servicing content on the web
  • What’s the difference between a repository and a cms?
  • If we focused on content management we would stop talking about OAI-PMH and start talking about search engine optimization
  • We are service oriented not resource oriented
  • Our institutional focus:
    • Is contrary to the nature of research and research communities
    • Makes web 2 apps unlikely because of small user communities
  • In some areas even a national focus is not enough and we should be approaching it globally

So what does Andy think a web 2 repository would look like?

He freely acknowledged the 'cons' of this approach:

  • No preservation
  • No complex workflows
  • Don’t expose rich metadata
  • Author searching citation counting not handled well by the current web

Having seemingly dismissed his own work in the area of repositories he went on to discuss what was good about the 'librariany' approach to repositories:

  • eprints and SWAP scholarly works application profile
  • FRBR offered a sound basis for identifying the multitude of versions of research eg preprint vs peer revied published PDF

The key points I got from his wrap up were:

  • Repositories don’t work with the real social networks used by academics
  • Open access is inevitable, we should focus on ‘making content on the web’ not ‘putting content in repositories’
  • The future lies in resource orientation, REST, and the semantic web