regex - Regular Expression for nested tags (Wikimedia content) -
has not regex in a while, and will rust slightly.
I am trying to extract categories from Wikipedia entry. What I need is a personal string contained in a pattern that starts with two open brackets and ends with two closed parentheses. is.
This query works most of the time -
(? [? & Lt; grade & gt; * [^ \] #]) ( [\]]
But there are problems when they have a comma (',') in the closing brackets.
Its unfortunate result is that when the following text is parse
lower = = [[Seattle, Washington]], [[United States | United States]] |
This category " "Removes the following for
Seattle, Washington]], [[Joint United States | USA]
Clearly, the comma is blocking it and it is getting the next set. The best way to capture each value between open and closed double brackets. What is the problem?
The problem is not a comma, the problem is that . *
Match will be "]] [[" Just with something else *
is greedy - it will be as matchable as possible. -Lalachi Sons Karan can use has been suggested that (as), or . *
[^ \]] *
- Change anything of the greedy match leaving closing the bracket should also do the trick.
In addition, these are not "nested" tags - this will be [[tags [[inside]]] tags]]
. Probably not what you want because I do not think this means in Wikimedia markup.
Comments
Post a Comment