Quantcast
Channel: UNIX and Linux Forums
Viewing all articles
Browse latest Browse all 16232

Perl:Regex for Search and Replace that has a flexible match

$
0
0
Hi,

I'm trying to match the front and back of a sequence. It works when there is an exact match (obviously), but I need the regex to be more flexible. When we get strings of nucleotides sometimes their prefixes and suffixes aren't exact matches. Sometimes there will be an extra letter and sometimes a letter will be missing or sometimes both.

For example if I was trying to match the string "Imhungry" in the front of a string and replace it with nothing I would use the following code.

Code:

$sequence =~ s/^.*?Imhungry//s;
This works great, but I need help writing some flexibility in the regex where I could also capture instances where
[1] single letter is missing eg."Imungry" or "mungry".
[2] a single letter is added (any letter) eg. "Immhungry" or Imhungryy"
[3] both eg. "Imhungyy" or "Immungryy" *notice this last example has two single letter duplications and one deletion

Thanks!

If this is too absurd let me know.

With a wildcard character I think I can do this.
Code:

$sequence =~ s/^.*?I{0,2}m{0,2}h{0,2}u{0,2}n{0,2}g{0,2}r{0,2}y{0,2}//s;

Viewing all articles
Browse latest Browse all 16232

Trending Articles