« Kevin Smith Interview | Main| Strip HTML using Lotus @Formula »

Strip HTML using LotusScript

QuickImage Category Technical

I needed a LotusScript routine to strip HTML out of some text I was importing from an ODBC data store. I did a quick Google search, and came up with a pretty nice start posted by Colin Williams. It wasn't "generic" enough for me, so I did a little work on it and came up with the following function and helper functions. Enjoy:

Function StripHTML (strSource As String, bool_StripOrphans As Boolean) As String
%REM
This function will strip HTML tags from a passed in string,
and return the resulting string.

Orphan Tags ("<" & ">") will be handled based on the value of bool_StripOrphans.
The Orphan Tags will be removed if bool_StripOrphans is True,
and will be ignored otherwise.
%END REM

Dim intPosOpen As Integer
Dim intPosClose As Integer
Dim strTarget As String

strTarget$ = strSource

If bool_StripOrphans Then
' Strip out Orphan Tags
Do
intPosOpen% = Instr(strTarget$, "<")
intPosClose% = Instr(strTarget$, ">")

If (intPosOpen% < intPosClose%) Then
' Either the first open indicator occurs prior to the first close indicator,
' or doesn't exist at all.
If (intPosOpen% = 0) Then
' The first open indicator doesn't exist.
' If the Orphan close indicator exists, then strip it out.
If (intPosClose% > 0) Then strTarget$ = StripFirstSubstr(strTarget$, ">")
Else
' The first open indicator exists, and occurs prior to the first close indicator.
' THIS INDICATES STANDARD MARKUP. STRIP IT OUT
strTarget$ = StripFirstSubstr(strTarget$, _
Mid$(strTarget$, intPosOpen%, (intPosClose% - intPosOpen%) + 1)
)

End If ' intPosOpen% = 0
Else
' Either the first close indicator occurs prior to the first open indicator,
' or doesn't exist at all.
If (intPosClose% = 0) Then
' The first close indicator doesn't exist.
' If the Orphan open indicator exists, then strip it out.
If (intPosOpen% > 0) Then strTarget$ = StripFirstSubstr(strTarget$, "<")
Else
' The first close indicator occurs prior to the first open indicator,
' and is therefore an Orphan. Strip it out.
strTarget$ = StripFirstSubstr(strTarget$, ">")
End If 'intPosClose% = 0
End If ' intPosOpen% < intPosClose%
Loop While ((intPosOpen% + intPosClose%) > 0)

Else
' Orphan tags are to be ignored.
Do
intPosOpen% = Instr(strTarget$, "<")
If (intPosOpen% > 0) Then
' An open indicator exists. Find the subsequent close indicator
intPosClose% = Instr(intPosOpen, strTarget$, ">")
Else
' No open indicator exists. Set the close position to zero and bail out.
intPosClose% = 0
End If ' intPosOpen% > 0

If (intPosClose% > intPosOpen%) Then
' The first open indicator exists, and occurs prior to the first close indicator.
' THIS INDICATES STANDARD MARKUP. STRIP IT OUT
strTarget$ = StripFirstSubstr(strTarget$, _
Mid$(strTarget$, intPosOpen%, (intPosClose% - intPosOpen%) + 1)
)

Else
' No close indicator exists. Set the open position to zero and bail out.
intPosOpen% = 0
End If ' intPosClose% > intPosOpen%
Loop While ((intPosOpen% + intPosClose%) > 0)
End If ' bool_StripOrphans

StripHTML$ = strTarget$
End Function ' StripHTML
Function StripFirstSubstr (strSource As String, strSubstr As String) As String
%REM
This function strips the first occurence of a substring from a string,
and returns the result.
If the substring is not contained within the source string,
this function returns the source string.
%END REM

If (Instr(strSource$, strSubstr$) > 0) Then
StripFirstSubstr$ = Strleft(strSource$, strSubstr$) & Strright(strSource$, strSubstr$)
Else
StripFirstSubstr$ = strSource$
End If ' (Instr(strSource$, strSubstr$) > 0)
End Function ' StripFirstSubstr
-Devin

Comments

Gravatar Image1 - Brill. But I don't think I will use this. I will have to type it over!
As a designer you know I am toooo lazy.

Gravatar Image2 - Put the functions in one of my agents and it is working good so far. I like the option of removing orphans!....you know...bad html does exist!!

Gravatar Image3 - I'm glad I could help.

-Devin.

Gravatar Image4 - I forgot to mention in the original post; one of my requirements is that the code conditionally (based on a parameter) strip out Orphan ("<" & ">") Tags. It does.
-Devin.

Search

Wowsers! A Tag Cloud!

Links

MiscLinks