Rss 2.0 via FEED
Ken Hughes... - PowerShell
Productivity, Technology and Automating Everything...
    
 

I was discussing ‘googlability’  - a new word I made up meaning ‘the ability to find via Google’ – of our knowledgebase with one of the technical guys at work.
It seems that we seldom get matches in Google searches (and the built in search is somewhat lame) – I was quite surprised with the fact that Google wasn’t matching anything.

Looking into it a bit further, I found that although our knowledgebase is public, the Urls are pretty undiscoverable, all having a ‘articleid’ parameter – obviously, the GoogleBot couldn’t just guess at the values and so was skipping the majority of our article, apart from the few listed on the main page.

We needed to give it some hints by adding a sitemap. I (ever so) briefly toyed with adding a sitemap page to the knowledgebase website using the standard XML based sitemap protocol etc, but our site is written in PHP and I didn’t want to get bogged down in all that again…
In a rare burst of being pragmatic and keeping things simple (as opposed to _way_ over engineering a solution) I recalled that Google’s webmaster tools allow you to submit a text file as a sitemap with one Url per line.

I knew the format of the Url for our articles so it just required a bit of PowerShell to generate a bunch of lines containing Urls with sequential numbers and write them to a file. version 1 looked like this :

set-content "c:\sitemap.txt" (1..1000 | %{ "http://support.c2c.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=$_&nav=0`n" })

However, uploading this sitemap caused the Google machine to choke and spew out a bunch of errors about invalid Urls… A little more digging uncovered that the text file uploaded must be encoded in UTF8. So version 2 looked like this :

set-content "c:\sitemap.txt" (1..1000 | %{ "http://support.c2c.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=$_&nav=0`n" }) -encoding UTF8

Out popped a text file with 1000 Urls, in the correct format, with the correct encoding and accepted by the Google machine with no problems.
Probably 10 minute work all in – I wouldn’t have even got the PHP coding tools fired up in that time – reminder to self “KISS works !!

GEO51.4043502807617:-1.28752994537354

Share/Bookmark

Posted: Friday, January 15, 2010 2:40:54 AM (GMT Standard Time, UTC+00:00)  #   Comments [1]
TAGS: PowerShell | Web

Disclaimer: Screenscraping results like this probably contravening Google’s Terms of Use (or something) and I do not advocate that you do it – this is purely hypothetical, if I did want to do it, this is how I would go about it  ;-)

Further Disclaimer: The results page formats could change at any time and may well break this script, if that happens you are on your own (FireBug and some modified regex should help you out).

image

So, if you wanted to get the Google ranking of a bunch of domains when searching for a particular term you could use one of the many SEO page ranking test sites that are available, but these are a pain in as much it they require you to enter the search term and the domain name you are looking for and they give you the ranking (what position in the results the domain name comes). that is fine for individual searches (like what position is kapie.com if I search on ‘Ken Hughes’), but not very good for doing a comparison of multiple domains against the search term.

I looked at using Googles Search API to get this info, but unfortunately it only returns 4 or 8 results (it is mainly designed to present some brief results in a box on your website), what I needed was to look at a lot more results (like up to 500)….

Back to my trusty friend – PowerShell…

I create a web client, have it download the first X (500) results to the search term, load the link Url and the position into a hashtable and then lookup the hashtable to find the rank position of each of the domain names I am looking for.
It was actually pretty easy, the only difficult part was getting the regex(s) correct – Regex is evil, as evil as Perl….

Here is the script code :

  $domainNames = "google.com", "live.com", "bing.com", "yahoo.com"
  $maxResult = 100
  $searchTerm = "search"

  $urlPattern = "<\s*a\s*[^>]*?href\s*=\s*[`"']*([^`"'>]+)[^>]*?>" 
  $hitPattern = "<\s*(h3)\sclass=r>(.*?)</\1>"

  $wc = new-object "System.Net.WebClient"
  $urlRegex = New-Object System.Text.RegularExpressions.Regex $urlPattern
  $hitRegex = New-Object System.Text.RegularExpressions.Regex $hitPattern
  $urls = @{}

  $resultsIndex = 0
  $count = 1
  while($resultsIndex -lt $maxResults)
  {
    $inputText = $wc.DownloadString("http://www.google.com/search?q=$searchTerm&start=$resultsIndex")
   
    "Parsing : " + $resultsIndex

    $index = 0
    while($index -lt $inputText.Length)
    {
      $match = $hitRegex.Match($inputText, $index)
      if($match.Success -and $match.Length -gt 0)
      {
        $urlMatch = $urlRegex.Match($match.Value.ToString())
        if(($urlMatch.Success) -and ($urlMatch.Length -gt 0))
        {
          $newKey = $urlMatch.Groups[1].Value.ToString()
          if(!$urls.ContainsKey($newKey))
          {
            $urls.Add($newkey, $count)
          }
          $count++
        }
        $index = $match.Index + $match.Length
      }
      else
      {
        $index = $inputText.Length
      }
    }
    $resultsIndex += 10
  }


  foreach($domain in $domainNames)
  {
    $maxPos = -1
    foreach($key in $urls.Keys)
    {
      if($key.Contains($domain))
      { 
        $pos = [int] $urls[$key]
        if(($pos -lt $maxPos) -or ($maxPos = -1))
        {
          $maxPos = $pos
        }
      }
    }
    if($maxPos -eq -1)
    {
      $domain + " : Not Found"
    }
    else
    {
      $domain + " : Found at result #" + $maxPos
    }
  }

Drop me a line in the comments if you find it useful…

GEO 51.4043197631836:-1.28760504722595

Share/Bookmark

Posted: Wednesday, June 03, 2009 11:43:37 PM (GMT Daylight Time, UTC+01:00)  #   Comments [0]
TAGS: PowerShell | Scripting | Web

A while back I restructured my website so that this blog no longer started at the root, instead starting from /blog. This was so that I could introduce some other web apps and have a subfolder for projects etc.

One of the pains of this restructure was modifying all the links - I thought I had caught all this with a Redirector HttpModule, but recently realised that for some reason I had not caught images embedded in the posts themselves.
Also it was becoming a pain having to remember to include the HttpModule in my web.config everytime I upgraded my blog (dasBlog)

I wanted it fixed properly this time, so grabbed a copy of all the XML files in my 'content' folder, copied them to a local folder and cracked open PowerShell...

I wanted every instance of www.mywebsite.com changed to www.mywebsite.com/blog - not difficult, but this would also change valid urls such as www.mywebsite.com/blog/page.aspx to www.mywebsite.com/blog/blog/page.aspx (note the /blog/blog in the url)

So I got everything I needed done with two 'one liners' in PowerShell...

dir | %{ $a = get-content $_ ; $a = $a -replace ("www.mywebsite.com", "www.mywebsite.com/blog") ; set-content $_ $a }

...and...

dir | %{ $a = get-content $_ ; $a = $a -replace ("www.mywebsite.com/blog/blog", "www.mywebsite.com/blog") ; set-content $_ $a }

All fixed...

 

GEO 51.4043197631836:-1.28760504722595 

Share/Bookmark

Posted: Sunday, July 06, 2008 3:35:38 PM (GMT Daylight Time, UTC+01:00)  #   Comments [1]
TAGS: Dasblog | PowerShell | Scripting | Web

twitterI have been playing with Twitter recently and thought it might be neat to see if I could post a 'tweet' from PowerShell. There is a great Google Group that discusses their API. The APIs are all REST based and really easy to use - the only complexity is that you need HTTP Basic Authentication to do anything 'real'.

One of the more simple API calls is to get the public timeline. No authentication is required for this so you can simply the url into your browser and get back the data (xml format, but json and other formats are available also). Try this :Windows_PowerShell_icon

http://twitter.com/statuses/public_timeline.xml

Now, for doing an update we need the following API:

update

Updates the authenticating user's status.  Requires the status parameter specified below.  Request must be a POST.

URL: http://twitter.com/statuses/update.format

Formats: xml, json.  Returns the posted status in requested format when successful.

Parameters:

  • status.  Required.  The text of your status update.  Be sure to URL encode as necessary.  Must not be more than 160 characters and should not be more than 140 characters to ensure optimal display.

The fact it must be a POST means we have to use a HttpWebRequest (as opposed to the easier WebClient). Anyway, here is the PowerShell function :

function Send-Tweet([string]$text, [string]$username, [string]$password)

{

     $updateurl = "http://twitter.com/statuses/update.xml"

     $result = $null

     $text = [System.Web.HttpUtility]::UrlEncode($text)

 

     [System.Net.HttpWebRequest] $request = [System.Net.HttpWebRequest] [System.Net.WebRequest]::Create($updateurl)

     $request.Credentials = new-object System.Net.NetworkCredential($username, $password)

     $request.Method = "POST"

     $request.ContentType = "application/x-www-form-urlencoded"

     $param = "status=" + $text

     $sourceParam = "&source=PowerShell"

     $request.ContentLength = $param.Length + $sourceParam.Length

 

     [System.IO.StreamWriter] $stOut = new-object System.IO.StreamWriter($request.GetRequestStream(), [System.Text.Encoding]::ASCII)

     $stOut.Write($param)

     $stOut.Write($sourceParam)

     $stOut.Close()

 

     [System.Net.HttpWebResponse] $response = [System.Net.HttpWebResponse] $request.GetResponse()

     if ($response.StatusCode -ne 200)

     {

           $result = "Error : " + $response.StatusCode + " : " + $response.StatusDescription

     }

     else

     {

           $sr = New-Object System.IO.StreamReader($response.GetResponseStream())

           [xml]$xml = [xml]$sr.ReadToEnd()

           $id = $xml.status.id

           $tweet = $xml.status.text

           if ($tweet.length -gt 50) { $tweet = $tweet.Substring(0,50) + "...(truncacted)" }

           $result = "Tweet " + $id + " added : " + $tweet

     }

    

     return $result

}

And to use it :

send-tweet "I'm sending updates from PowerShell, cool or what ??" "<your_username>" "<your_password>"

 
GEO 51.4043197631836:-1.28760504722595

Share/Bookmark

Posted: Tuesday, April 01, 2008 10:23:11 AM (GMT Daylight Time, UTC+01:00)  #   Comments [1]
TAGS: PowerShell | Twitter

It's time to flex those scripting fingers and get your brain warmed up - The 2008 Scripting Games are coming.

Put a note in your calendars - February 15-March 3, 2008.

Stay tuned to the Scripting Games Tips for some (possible) hints and useful techniques that the tasks may involve...

GEO 51.4043197631836:-1.28760504722595

Share/Bookmark

Posted: Monday, January 07, 2008 11:09:59 AM (GMT Standard Time, UTC+00:00)  #   Comments [0]
TAGS: PowerShell | Scripting

One of my colleagues switched me on to PowerShell Plus and I'm loving it.

PowerShellPlusUI Code editor, snippets, values of variables, logging tools and much more, including a really neat feature called 'MiniMode' (see the toolbar icon at the extreme right in the image.

This 'MiniMode' closes all toolbars/toolwindows except the main console but also makes the console window transparent (user configurable level of transparency). This mode is real easy to work with...

PowerShellPlusMiniMode

There is a free single user license for non commercial use.

I encourage you to try it out.

GEO 51.4043197631836:-1.28760504722595

Share/Bookmark

Posted: Monday, December 17, 2007 10:53:02 PM (GMT Standard Time, UTC+00:00)  #   Comments [0]
TAGS: PowerShell | Scripting | Software | Tools

PowerShell has been around for some time now, what with betas and CTPs. For (the released version of) Vista it became available a month or so ago.

It's been on my 'must get to grips with' list for a while now and I've kinda been following some blogs about it, slowly getting a little knowledge here and there.

Mr Hanselman (the oracle for all things technical) has done a couple of podcasts on it and I listened a couple of weeks ago to an episode of Hanselminutes where he interviewed Bruce Payette. Bruce is the language architect for PowerShell and has just (2 weeks ago) released a book on it (Windows PowerShell in Action).

Bought the book last week in the US and have had it open ever since. Everything I do, I now do with a rosy PowerShell perspective. The only way (I find) to really to get to grips with something is to completely immerse yourself in it - think in it, live it, breathe it....

Today I have been updating the server side file for our Archive One (email archiving for Exchange) auto update feature (we're just released V5.0 SR1, so the build numbers that are checked have changed). It's a simple XML file that is parsed for 'GA' release version (ProdVer) and 'HF' version (hotfix). It looks like this (sample only):

<?xml version="1.0"?> <Versions> <AOnePolService ProdVer="5.0.0.1643" HotFix="5.0.0.1643"></AOnePolService> <AOneCmplService ProdVer="4.3.0.1077" Hotfix="4.3.0.1094"></AOneCmplService> </Versions>

I wanted to provide a quick and easy way to find the latest version of each products. I came up with the following PowerShell one liner (split over three lines for readability):

([XML] (new-object ("net.webclient")).Downloadstring(
"http://support.c2c.com/versioncheck/currentversions.xml")).versions.get_ChildNodes() | 
% { "" } { $_.psbase.Name + "`t GA=" + $_.prodver + "`t HF=" + $_.hotfix } { "" }


Downloads the xml file, parses it and lists out the product, GA version and HF version

Watch this space for some Active Directory related stuff as I have been very active in scripting AD over the past few days and am in the process of porting it to PowerShell.

Share/Bookmark

Posted: Monday, February 26, 2007 5:57:22 PM (GMT Standard Time, UTC+00:00)  #   Comments [0]
TAGS: Scripting | PowerShell
     
 
 
Copyright © 2010 Ken Hughes. All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.