Data extract from HTML file

Started by Inkblot, Aug 02, 2010, 15:41:50

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

Inkblot

I am using winaudit to audit all of the laptops at work and it works superbly. As always however I am looking for a little more from it. Specifically, I want to be able to extract information from the HTML files that it generates - things like name/model/serial as well as total RAM, MAC Addresses and suchlike - all things that are in the HTML file and things that are always named in the same fashion:

<tr><td>Computer Name</td><td>LAPTOP000239</td></tr>
<tr><td>Total Memory</td><td>2944MB</td></tr>
<tr bgcolor="#f2f2f2"><td>MAC Address</td><td>70:F1:A1:xx:xx:xx</td></tr>

Is there a simple (& cheap!) way to extract data from HTML files? Ideally I want it to end up in something like Excel so I can run reports on all laptops with less than 2g RAM or all laptops with a particular type of Wireless card. I have about 150 audits so far (With more to be done) and trying to find information manually is a pain. I'm hoping that as the filename is always the same (Each audit is in it's own directory) and the field names I want are always the same I will be able to run a simple script and generate an Excel spreadsheet with my chosen fields - probably wishful thinking but I've got to ask!

Rik

Rik
--------------------

This post reflects my own views, opinions and experience, not those of IDNet.

Ray

Is saving as a csv file any good to you, Inky?
Ray
--------------------

This post reflects my own views, opinions and experience, not those of IDNet.

Inkblot

Quote from: Ray on Aug 02, 2010, 15:51:06
Is saving as a csv file any good to you, Inky?

Possibly, although I would still have to work out how to extract the data from it! Also, I already have 150 of them as HTML and can't really start over using .csv instead unless it *really* is the only option - it's taken 3 months to get this far!

The particular file I am interested in is <Name>_right.html (So not all the same name but all have _right as part of the filename) and it is of course mostly HTML stuff - and I just don't understand that at all :(

Glenn

Glenn
--------------------

This post reflects my own views, opinions and experience, not those of IDNet.

Inkblot

Quote from: Glenn on Aug 02, 2010, 17:19:12
Any use? http://blog.outwit.com/?p=54

Quite possibly :)

I also just discovered the 'Export data to Excel' functionality within IE8 - right click on the page, select Export data, highlight the tables you want to import and away you go, probably similar to the outwit way and certainly looking quite hopeful! Thanks :)

MisterW

It's a shame you didn't use the XML format from WinAudit. It gives you the ability to view the report exactly as before with a browser ( using the supplied wa_xml2html.xsl stylesheet ) but also you can process the raw XML with many tools , Excel for one.

esh


[xxxxxx@paragon reduce]$ python
Python 2.6.4 (r264:75706, Oct 27 2009, 06:25:13)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> text = '<tag1>blah</tag1><tag2>test</tag2>'
>>> import re
>>> a = re.search('<tag2>(.*)</tag2>',text)
>>> a
<_sre.SRE_Match object at 0x7fc150ea1a08>
>>> a.group(0)
'<tag2>test</tag2>'
>>> a.group(1)
'test'


Well, it's free. Plus you can configure it to your heart's desire, including outputting to csv files. Shout if you need any more tips.
CompuServe 28.8k/33.6k 1994-1998, BT 56k 1998-2001, NTL Cable 512k 2001-2004, 2x F2S 1M 2004-2008, IDNet 8M 2008 - LLU 11M 2011