Friday, July 1, 2011

Bias, Search Strategies, and Converting MS-Word documents to MediaWiki pages

It's interesting that shortly after I posted about the "taboo" heuristic, I found a real situation that warrants a revision.

A couple days ago, I decided I wanted to quickly convert a lengthy microsoft word document to the mediawiki format. I couldn't find a way to do it. Today, I stumbled across a way to do it while searching for something else (a way to automatically upload images to mediawiki). I thank google for remembering previous searches--google uses these to refine my current results, and also I have a record of what didn't work, and what might have worked.

Here are the search terms that were fruitless:
* Convert HTML to wikipedia wiki
* html to wiki converter

The page I stumbled on later was this page, which explains that Openoffice can export to mediawiki format. It never occurred to me to search by the word "export". Why not? I can come up with a variety of reasons:
* I was looking for a converter. I didn't think to question if there were other words to describe my goal, or other ways to accomplish it.
* I was looking for something specific: going from HTML to wiki formats. I assumed that html was the most likely format from which to start. This was a fallacious assumption.

This implies something about google search strategies. As Jakob Neilson says, people rarely change search strategy -- in that article, he explains some of the problems with search skills, and how search engines can be made more usable.

In my case, I keep thinking "what heuristics can I come up with, that will help my search strategy in similar situations in the future?"

I have other search problems, too--what happens when I want to find a fun new game to play, but I have no idea what I like? What happens when I don't have a specific thing in mind--I just want father's day gift ideas?

In order to find out what heuristics work for changing a search strategy, I will need to come up with a new search problem, and a list of heuristics that might help.

New Search Problem (test my new heuristics)
I have been looking for an easier way to upload images to wikipedia. I haven't found a way to quickly upload images to wikipedia yet, despite trying to find solutions in the past. So, that will be my test case for a new heuristic. If it helps me solve my problem, I will know I have a success. I have a risk of a false negative. Also, the heuristic that works this time might not work for other search problems. I don't plan to address that here.

From here on, I am chronologically showing my thought processes and search progress.

New Heuristics to try on my new search problem
I'm brainstorming for ways to try and re-frame a search problem.
Here are some of the heuristics I thought of to try, and things that worked with the previous example:
* De-focus. Make my search terms less specific. Avoid exact file types.
* Find synonyms. Use a thesaurus to find similar words and actions, related to what I want to do. For "convert", I might need to do some actual research to find alternative ways of expressing my desire. Some examples might be save as, upload, export, convert.
* Find related actions. Break down the steps of what I want to do, by zooming out and then zooming in. With "convert html to wiki", I could zoom out to my main goal: get a formatted document into mediawiki without having to re-work the formatting manually. My starting document is MS-Word, but it could be any of a number of formats that any office editor can save to. This sparks some ideas: maybe I can use openoffice, abiword, wordpad, notepad. Maybe I have to use several in a row to get to a format I like (one being a pdf). With so many options, I need to focus on the destination file type first: mediawiki. So I should search google for "to mediawiki format", or some variants.

what happened?
As I continued searching, I came up with other ideas for switching up my search strategy--this is probably the most important element: actively thinking of ways to break out of a box. That's the most important heuristic.

Here are some of the strategies I tried: (remember, I was looking for a way to quickly upload images to wikipedia--specifically while I am already in the middle of editing an article):
* Think in terms of a less-tech-savvy user. What questions would they ask? Or instead, what complaints would they make? I searched for "too hard upload image wikipedia"
* Think in terms of the basics. "How to add image to mediawiki OR wikipedia" (less success)
* Think in terms of alternate related scenarios that don't quite fit my goal. "I need to upload a ton of images to mediawiki" (this seemed more successful-results looked more promising)
** Other scenarios involve automatically uploading images to other sites, integrated browser features, wiki image editors, and so forth.
* Look further down in search results (not much success)
* Click on something that doesn't look helpful
* Look at words in search results to get ideas for more search terms. (resulting terms: upload, script, import, robot)

Results:
It looks like there are basic robots I can use to do what I want, but a complete solution isn't available. I wanted to be able to paste from my clipboard into wikipedia, as quick and simple as that. The most successful close thing I found was http://meta.wikimedia.org/wiki/Pywikipediabot/upload.py -- perhaps I will have to build my own tool? By searching for "paste images to mediawiki", I found this fogbugz feature request, which shows me that others have been unsuccessful in finding a similar tool (so far).

Analysis:
One of the best google searches I tried, near the end, was "clipboard mediawiki image". I left out verbs entirely. Perhaps one search strategy is to eliminate verbs, nouns, or adjectives, or focus on exclusive lists of the nouns, verbs, and adjectives that will be most useful. Also, since nouns are more concrete, perhaps they are likely to be more successful. With this search strategy, I found a list of very promising results--which kept me looking further down in the results page and (unlike other searches) I went to the second page of results. This seemed to result in a near hit: someone bragging that they had made a windows tool for the job. However, there was no link to the project. I couldn't find the project by name or description, after that...

Conclusion
You have to use a variety of search strategies, and I am not sure what works best. It seems like searching for alternate scenarios, and reading more than the top screen full of promising results, was most helpful. Finding alternate words, in this case, didn't help a lot--but it's hard to prove it won't help in other circumstances. Dropping verbs helps. Looking for words and terms to drop (or change and generalize) is often more difficult and more helpful than looking for search terms to add.

In the end, I decided that the easiest solution to my particular problem was to just use the current mediawiki file upload features. However, it was a useful introspective study of some ways to change search strategies. Maybe I will compose a list of search strategy heuristics as I learn more.

Edit/Addendum
It occurred to me weeks later that I never thought about trying alternate search engines, specialized search engines, or just asking a question on a forum like superuser.com.

I also thought of a couple more search strategy changes that generalize on what I stated above.
  • Search by a problem statement
  • Search for similar problems that are not your problem (e.g. batch upload, batch remove, etc)
  • Search by solution keywords
  • Eliminate key search terms, eliminating assumptions
  • Add key search terms
  • Find synonyms. Use a thesaurus or a reverse dictionary.
  • Generalize any and every search term (e.g. mediawiki->wiki->upload/paste/...)
  • Get more specific on any search term (e.g. mediawiki->wikipedia->wikipedia:image ooh, wikipedia:file...)
  • Drop (all) search terms by grammatical type (verb, noun, adjective)
  • Switch to different grammar parts (switch from verbs to nouns, etc)
  • Try to think like someone else (less tech savvy user
  • Ask if a search engine is the right venue. (e.g. forums, libraries, real people, etc)
  • Ask if this search engine is the right venue. (find other search engines, specialized or not)
  • Search for other search terms using results pages, and especially forum conversations. This takes more time.
  • Keep looking "here": Follow links that don't look promising. Look at several more pages of results.
EDIT: 
In a conversation with a friend, they pointed out that another search re-framing heuristic is to find other infromation sources: switch search engines, find a specialized search engine, look at bibliographic data of unavailable/unapplicable solutions to find similar sources or ideas.

Also, I recently read Kaner's document on ET, where he suggests using the CIA Phoenix Checklist for problem solving.  I think it's very applicable.  You will probably find it if you search around a bit.

No comments:

Post a Comment