Links

Lists

Latest Updates

Ruby On Rails List
Python list
Advanced Java
The JavaScript List
Apache Users
Full Disclosure
Linux Security

Search the archives!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Javascript] regexp - how to exclude a substring?


  • From: paul at novitskisoftware.com (Paul Novitski)
  • Subject: [Javascript] regexp - how to exclude a substring?
  • Date: Sat May 21 16:52:25 2005

Shawn et al.,

I'm parsing some HTML using regular expressions but I'm stumped on one point:

I want to find a string that begins with "<div" and ends with "</div" that 
does not enclose a nested "</div".

I'm starting by locating a start & end tag pair:

	/<div.*>.*<\/div/si

[si = include newlines + case-insensitive]

I'm actually locating a specific tag using a regexp like this:

	/<div [^>]*id="target".*>.*<\/div/si

That finds my starting & closing tags, but if I've got multiple divs it 
finds everything up to & including the final </div on the page.

Therefore as my next step I need to know how to exclude "</div" from the 
innerHTML of the div.  I've tried (.*(<\/div){0}) but it doesn't seem to work.

1) How do I say "allow any number of any characters but don't allow this 
substring"?

2) The direction I'm headed is to be able to include all nested divs in my 
target div.  In other words, the range of selected text should include an 
even number of start & end tags of the same tagName as my target tag:

	<div id="target">
		<div>blah he blah</div>
		<div>blah he blah
			<div>blah he blah</div>
		</div>
	</div>

I figure that once I solve problem 1) I'll be able to assemble a regular 
expression that allows nested tags (<div...>...</div) at least to some 
reasonable level of nesting.  Any suggestions?

Thanks,
Paul