Asked 1 month ago by MercurialSatellite233
How can I modify my regex to avoid replacing 'ABC' when followed by '.png' or ' thumb.png'?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by MercurialSatellite233
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I'm trying to replace the word 'ABC' with 'bueno', but only when it isn't immediately followed by either ".png" or " thumb.png". I expected the following code to produce the output:
PYTHONimport re texto = "ABC ABC. ABC.png ABC thumb.png" regex = r"ABC(?!.png)|ABC(?! thumb.png)" novo = re.sub(regex, "bueno", texto) print(novo)
The expected output is:
PYTHONbueno bueno. ABC.png ABC thumb.png
However, the actual output is:
PYTHONbueno bueno. bueno.png bueno thumb.png
It seems that the regex is not correctly detecting the space and is interfering with the first condition. I've read some answers suggesting the use of a single negative lookahead to check for either exception. How can I combine these exceptions properly so that 'ABC' is only replaced when not followed by ".png" or " thumb.png"?
Comments 0
•Answers 5
•Views 883
Answer by TychoAI Agent • 1 month ago
Your issue is caused by using separate alternatives for the negative lookahead. In your regex
PYTHONregex = r"ABC(?!.png)|ABC(?! thumb.png)"
the two alternatives check only one exception at a time. For example, when processing "ABC.png", the first alternative fails (as expected) but the second alternative (which only checks for " thumb.png") passes because ".png" doesn’t match " thumb.png". This causes the substitution to occur where it shouldn’t.
The solution is to combine your exceptions into a single negative lookahead that checks for either ".png" or " thumb.png". Also, remember to escape periods (".") since they are special characters in regex. The corrected regex is:
PYTHONimport re texto = "ABC ABC. ABC.png ABC thumb.png" regex = r"ABC(?!(\.png| thumb\.png))" novo = re.sub(regex, "bueno", texto) print(novo)
With this pattern, after "ABC" the regex checks that what follows is neither ".png" nor " thumb.png". The output now becomes:
bueno bueno. ABC.png ABC thumb.png
which matches your expectations.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by GalacticScholar846 • 1 month ago
This will replace 'ABC'
with 'bueno'
only if it is not followed by '.png'
or ' thumb.png'
.
PYTHONimport re texto = "ABC ABC. ABC.png ABC thumb.png" regex = r"\bABC\b(?!\.png|\s?thumb\.png)" novo = re.sub(regex, "bueno", texto) print(novo)
RESULT:
bueno bueno. ABC.png ABC thumb.png
NOTE: In the regex, added \b
at the end of ABC
=> \bABC\b
, so that it will not match if there is another word character after the letter C
. For example, it will not match 'ABCd', or *'zABCd'.
No comments yet.
Answer by MeteoricKeeper562 • 1 month ago
You can make thumb
optional inside the lookahead:
REGEX\bABC(?!(?: thumb)?\.png)
The regex matches:
\bABC
Match ABC preceded by a word boundary to prevent a partial word match(?!
Negative lookahead, assert that what is directly to the right is not
(?: thumb)?
Optionally match thumb
\.png
Match .png
)
Close the lookaheadSee a regex 101 demo
Note that if .png should also not have a partial word match at the end, you can also a word boundary like .png\b
No comments yet.
Answer by PlutonianScout835 • 1 month ago
The OR (|
) needs to be inside the negative lookahead:
REGEXABC(?!.png| thumb.png)
Let's take the substring ABC.png
as an example of what was happening with your original pattern. The regex engine would first try to match this using ABC(?!.png)
, which would fail because of the negative lookahead. Then, because of the OR, it would jump to the next case, ABC(?! thumb.png)
. This would match ABC.png
, which means the string would get replaced.
No comments yet.
Answer by QuasarSentinel796 • 1 month ago
Starting with your original pattern:
REGEXABC(?!\.png)|ABC(?! thumb\.png) (Note: Dot is a regex metacharacter and should be escaped with backslash)
This will match ABC
which is not followed by .png
or ABC
not followed by thumb.png
. Every possible occurrence of ABC
will match this pattern. Therefore, all occurrences of ABC
will be match, because every extension will match at least one of the two conditions.
We can write the following correction:
REGEX\bABC(?!\.png| thumb\.png)
This pattern says to match:
\b
word boundaryABC
match ABC
(?!\.png| thumb\.png)
neither .png
or thumb.png
followsThe negative lookahead used here basically has AND flavored logic, and will exclude both following extensions.
No comments yet.
No comments yet.