Showing posts with label hints. Show all posts
Showing posts with label hints. Show all posts

2007-11-01

10 Dos and Don'ts When Using Microformats Parser

MicroformatParser has actually been used in real world (ie. out of my sandbox testing grounds) for some time now, and I've been getting valuable feedback from developers. During that time, some of the most common problems - and some of the best practices to circumvent them - have emerged, and I thought it would be nice to collect them all in one place to share with others.

Dos

Please, do:

... use Tidy

The web is filthy, and you do need something to keep you clean. You can't just assume that you're working with well-formed XML from an external source- 9 out of 10 times the XML parser will choke and your script will croak because of that assumption.

What you can do is try and decrappify the input using Tidy. For a kick- start on using Tidy with PHP, you may want to check out this post as well.

... check your PHP version

For PHP4, everything should just work right out of the box. However, for PHP5 you'll need this script, by Alexandre Alapetite. He's done a great job of wrapping DOM XML extension API, making it available to PHP5 users.

... check xArray documentation

It may be tempting to just call toArray() method on the result and work with a familiar datatype. However, xArray is specifically crafted to facilitate working with collections of objects, such as your parsing results. The documentation is included in the package, and you can re-run PhpDocumentor over the source file to get it in a format you prefer. For more info on xArray you can also check out the documentation wiki. It is a work in progress, but some valuable info is already there.

Also, there is a new xArray version on the way (v0.2), which will make handling complex trees of data even easier.

... check if (bool)FALSE is returned

On error, MicroformatParser returns (bool)FALSE instead of an xArray object. So make sure that everything went OK before you try to do anything further with the result:

if($microformatsResult) ...
... use caching

Actual fetching of the remote page will most likely be the slowest part of your script (if it's not, something is seriously wrong). So, to shorten the execution time, implement some sort of caching mechanism in order to keep remote page fetching to minimum.

... contact me

This isn't really a "best practice" thing, but I think it's still worth keeping in mind. If you find a bug or just keep hitting the wall, don't hesitate to contact me. I'll try to help as much as I can.

Don'ts

There aren't as many of those, but they're just as important. So, please don't:

... assume you're parsing well-formed XHTML

Because it's just not true, most of the time.

... use PHP5 DOM XML extension

As of PHP 5.0, the required DOM XML extension is not bundled with PHP anymore. There is one available from PECL, but you don't want to use that. Thanks to deneme's patience and valuable input we discovered that you can't really plug it in and expect everything to work. You should keep away from it and use Alexandre Alapetite's solution instead.

... use it for something malicious

I can't really tell you what to do with it, but please don't use it for something bad, like email scraping. Would you like your name and email listed in some new directory handed down to generations of spammers? No, I bet you wouldn't. So don't do it to others, either.

... output invalid XML (XHTML included)

This is not strictly related to MicroformatParser usage, but it's a good advice nevertheless. Please, don't do that. The rest of the web will thank you for your effort.

2007-05-19

Advanced tips for Total Commander

As I hinted in my previous post on Total Commander, there are lots of ways it can help you in your day to day tasks. There are all sorts of addons available, but you can make it do even more - a little bit of shell scripting and documentation reading goes a long way here, because 5 minutes spent today can (and will) save you hours in the future.
There are a couple of (easy) ways you can extend TC functionality with your own actions, tailored to suit your workflow. You can either use TC internal commands, or make your scripts accessible from the button bar, the Start menu, or the directory hotlist. There is a big advantage to the button bar and start menu usage, as you get the special runtime parameters you can pass to your scripts. However, I find these harder to access then the directory hotlist. I have Ctrl+D hardcoded in my fingers and it's right in front of my eyes all the time, so I'll stick to it.

The Directory Hotlist menu

Directory hotlist button in TC Sure, we all add our frequently used directories there, and some of us perhaps even organize it in sections. But we can make it do more useful things - there are a lot of builtin TC commands available from there and plus, anything that can be executed from the shell can be executed from there as well. Just open the menu (either with your mouse, or by pressing Ctrl+D), and select Configure. In a new window you'll get, click Add Item button, and enter the name for your new action. Then, either select an internal command from the Command: dropdown, or enter the shell command you wish to be executed. For an example:
cmd /c "tree > tree.txt"
This will create a file named tree.txt in your currently open directory, with the tree of its subdirectories. OK, so this wasn't very useful but you get the picture. Moving along.
TC Command dialog
Now, suppose I have some movies on my HD that I want to know more about. It is quite trivial to find the information on the 'net - I'd just go to IMDb and do a search for the title. However, it is a multi-step operation: open my browser, type the address, type the movie title, hit search. Obviously, that should be easier - I mean, that's why we have computers in the first place, right? Right. So let's make a script to automate this task for us and make it easily accessible in TC hotlist.

What you sed is what you get

Note: you will need some special tools for this - namely, sed and gawk (you should have them around anyway - it's really good stuff). Fortunately, they're freely available as a part of UnxUtils package, a porting effort of some common GNU utilities to native Win32.
After you download the archive, unpack the sed.exe and gawk.exe from usr/local/wbin to directory in your path (eg. C:\WINDOWS).
Thinking about it, almost every movie I encountered was in a directory named after its title. So let's use that as the starting point:
cd | sed "s/.*\\//"
We use cd output to find out the current working directory. Unfortunately, cd outputs the full path (eg. D:\Desktop\Batch) instead of what we need. That's why we sent the output to sed - this will replace everything up to the last \ with nothing (if you'd like to know more about sed, there are a lot of good tutorials around. You may want to try this one).
There is yet another problem: cmd.exe is not cool at all. We can't just assign the output we just got to a shell variable in order to use it later - not in a straightforward way (using SET), anyway. However, there is a way to do just that:
cd | sed "s/.*\\/set directory_name=/" > tmp_.bat
call tmp_.bat
del tmp_.bat
The code above will replace everything up to the last \ from the cd output with set directory_name= and dump that in a file named tmp_.bat, so the newly created file would contain something like:
set directory_name=Whatever Directory Name
Next, when we CALL the tmp_.bat, it will import the directory_name environment variable into the current namespace. After that, we don't need the intermediate script anymore, so we delete it (using DEL). And the hard part is actually over: the only thing left is to do the search. Fortunately, IMDb accepts GET search requests, so this is trivial:
"C:Program Files\Mozilla Firefox\firefox.exe" "http://www.imdb.com/find?s=tt&q=%directory_name%"
This will open [Firefox] at the given URL - which is the IMDb search results we were after all along. Now, we wrap this up in a batch file:
cd | sed "s/.*/set nme=/" > tmp_.bat
call tmp_.bat
del tmp_.bat
"C:Program Files\Mozilla Firefox\firefox.exe" "http://www.imdb.com/find?s=all&q=%nme%"
and save it somewhere. Next, we do Ctrl+D -> Configure -> Add Item, enter a descriptive name for our new action, and enter this at the Command: line:
cmd /c "Full_Path_To_Your_Batch_File.bat"
Every time you execute your new action, the script will run in the current directory - the one you're browsing in currently active TC pane.
Note: there may be more elegant ways of doing such things, but I like to keep my scripts hackish and simple. For me, a shell script is a quick and dirty way out - it should be intuitive to write and quickly provide the results without too much hassle. Anything that needs to be elegant should be done in a proper programming language anyway.

Search for more info on MP3s

I like to know stuff about what I'm listening to. There is a lot of information available on the web about different artists and albums, and I find myself often doing repetitive work to get to it. However, it is quite easy to get all (or most) of it at once with a simple script, similar to the one we just did.
However big your MP3 collection is, chances are that most of the files are in some of the most frequent folder schemes - either ArtistName\AlbumName or ArtistName - AlbumName. The approach is very similar for both cases, we'll use gawk to extract the information from the path provided by cd command. First, the ArtistName\AlbumName case:
cd | gawk -F\\  "{print (\"set artist=\" $(NF-1) \"\nset album=\" $NF)}" > tmp_.bat
We'll use the same method to obtain the command output as environment variables, but using a different tool (gawk) to get it done. First we declare \ to be a field separator - that means that gawk will split into separate fields whatever we give it at each \. We can access fields by attaching $ to the number of field that we want, left-to-right ($1 for the first one, $2 for the second, etc). Since we want the one on the far right, we use NF - it means number of fields in the gawk lingo. We also need the one just before the last, so we subtract 1 from the number of fields - that's what $(NF-1) means.
If none of this makes sense to you, don't worry. Awk (gawk is just a flavor of awk) is very well documented, and there are a lot of tutorials around. The basics are covered here, and if you'd like to learn more, try this one.
We also need to print it, and therefore we need double quotes. That is why we escape the quotes in the print statement by adding \ in front of them. This way, cmd.exe will leave those quotes alone, and let gawk parse it instead. Note that we need each SET statement to be on the separate line in the intermediate batch file. That's why we separate them with \n, which means "insert newline here".
The ArtistName - AlbumName case is quite similar. We'll build on what we got from the previous scripts:
cd | sed "s/.*\\//" | gawk -F- "{print (\"set artist=\" $1 \"\nset album=\" $2)}" > tmp_.bat
So, first we extract the last directory in the current path, just like in the IMDb search script. Next, we declare '-' to be field separator because we want to split the directory name at dash character. Since we assume that artist name is on the left side of the dash, and everything else to be album name (we'll be doing a web search, so we don't have to get surgical about this), we can use simple $1 and $2.
Once we have our %artist% and %album% variables set in the intermediate batch file, the rest of the script is the same for both cases. We use the old technique to get to them and then we put them to use:
REM ...
call tmp_.bat
del tmp_.bat

REM +---------------------------------------------------------------------------------
  REM   This is where we set up search URLs. 
  REM   Comment out the ones you don't need, or add some more.
REM +---------------------------------------------------------------------------------
set discogs="http://www.discogs.com/search?type=artist&q=%artist%" 
set cduni_artist="http://www.cduniverse.com/sresult.asp?HT_Search_Info=%artist%&HT_Search=ARTIST" 
set cduni_album="http://www.cduniverse.com/sresult.asp?HT_Search_Info=%album%&HT_Search=TITLE" 
set gimg_artist="http://images.google.com/images?q=%artist%" 
set gimg_album="http://images.google.com/images?q=%album%" 
set gimg_both="http://images.google.com/images?q=%artist% %album%" 
set wiki="http://en.wikipedia.org/wiki/Special:Search?search=%artist%" 

REM +---------------------------------------------------------------------------------
  REM   OK, were all set, let's do some searching:
REM +---------------------------------------------------------------------------------
"C:Program Files\Mozilla Firefox\firefox.exe" %discogs% %cduni_artist% %cduni_album% %gimg_artist% %gimg_album% %gimg_both% %wiki%
Now, we just substitute the REM ... line with the appropriate gawk command, and the script is ready to go into the directory hotlist as a brand new action.

2007-04-20

Using Total Commander

Although I'm not very keen on Shareware, Total Commander is the file manager for windows and for me, nothing comes even close to it. It has many useful features, both basic and advanced, to make your life so much easier. It's extensible, too: there are plugins for file undeletion, accessing ext2, ext3 and reiser partitions, opening bz2, rpm, iso, icl, msi files, fulltext searching in PDFs, accessing your POP3/SMTP server, Symbian telephone or iRiver flash player, etc. And it can be copied to your USB stick, so you can carry it around with you.

Even without a single addon, it is by far the most useful file manager I ever used. It has internal zip/gzip archive support, advanced renaming options for multiple files, file/directory comparison, different views and filters, FTP support, great search capabilities, two-paned interface with tabs, history and configurable bookmarks.

Actually, configurable bookmarks are perhaps the most valuable, and yet the most underrated feature of this tool. While you can just add locations you frequently visit (by clicking on the "Add current dir" option in the dropdown list), you can add some really useful actions there as well, while keeping everything neatly organized, hierarchically. This can be done by clicking on the "Configure" option at the bottom of the dropdown list in a simple and convenient interface.

While adding directory locations and organizing the menu are pretty obvious tasks, adding actions is not as straightforward (still, it is quite easy). You can add any kind of action that you'd do in a shell - for an example, you can make a quick file list text file with this action:

cmd.exe /c "dir /b /oG > listing.txt"

Of course, that would list all the files in the current working directory - the one that's open in your active TC pane. Hint: you may want to consider installing Unix utils somewhere in your path for even more power. For an example, you could generate a track list for audio CD cover from your mp3 folder with an action command like this:

cmd /c "dir /B /oG *.mp3 | sed s/\.mp3\b//i | gawk "{print NR \") \" $0}" > list.txt"

As of v5.51, you can also choose one of Total Commander's internal commands from the dropdown combobox - e.g. cm_OpenDesktop to switch to the Desktop folder. Even more useful internal commands (at least, the ones I use all the time) include:

cm_CopyNamesToClip
Puts name(s) of selected file(s) without path information on the clipboard - my girlfriend uses this all the time to make Photoshop audio CD covers. This works on directories as well.
cm_CopyFullNamesToClip
Puts full path(s) of selected file(s) on the clipboard. This also works on directories.
cm_CountDirContent
Quickly calculate the total size of one or more selected directories.
cm_SelectCurrentExtension
Selects files with the same extension as the one that's currently selected.

Now, I could just install about a dozen (or more) other tools instead and do all the stuff I do with TC, but it's so great having all that in just one tool - it keeps my workflow uninterrupted, my desktop/quicklaunch/start menu uncluttered and my fragile spiritual balance undisturbed.

Of course, there are downsides: TC is a Windows-only tool (although it has been ported to Windows CE/Pocket PC). Also, it is a Shareware program, which means that you can test it for a period of 30 days. After testing the program, you are expected to either order the full version, or delete the program from your harddisk. Ouch.

There are few freeware alternatives - one of the most usable ones I tried is FreeCommander. Unfortunately, it has a much more limited set of builtin features and no plugin support - though it still beats the hell out of Explorer.

In the free world, many people - including me - feel that mc (Midnight Commander) is still the best file manager there is. However, a relatively recent project looks quite promising: GNOME Commander reached version 1.2.3 (stable), with plugin support announced to be introduced in version 1.3.