Welcome to the Free Talk Live bulletin board system!
This board is closed to new users and new posts.  Thank you to all our great mods and users over the years.  Details here.
185859 Posts in 9829 Topics by 1371 Members
Latest Member: cjt26
Home Help
+  The Free Talk Live BBS
|-+  Free Talk Live
| |-+  General
| | |-+  Programming Question - Regular Expressions
Pages: [1]   Go Down

Author Topic: Programming Question - Regular Expressions  (Read 3058 times)

0 Members and 1 Guest are viewing this topic.

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Programming Question - Regular Expressions
« on: May 30, 2009, 02:53:02 AM »

Ok, time to geek out and test the theory that basically almost every liberty minded person also happens to be a programmer... so, someone here should be able to answer me...


Perl Compatible regular expressions...

I need to modify a regular expression that parses a URL out of a block of text.

It's currently two expressions

1st

replace
Code: [Select]
.*<.*?href=" with (nothing)


2nd

replace
Code: [Select]
".*?>.* with (nothing)

by (nothing) I mean that part is empty... the goal is to take the block of text and edit out everything that is not part of a URL, so that I am left with just the URL...

The problem is that when there are two URLs in the block of text, the whole thing explodes and doesn't work right at all...
Is there a way to modify the expressions to pluck out only the FIRST instance of a url?  I am a complete n00b to regular expressions.
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

anarchir

  • Extraordinaire
  • Offline Offline
  • Posts: 5103
  • No victim, no crime.
    • View Profile
    • Prepared Security
Re: Programming Question - Regular Expressions
« Reply #1 on: May 30, 2009, 02:55:06 AM »

Was one, but stopped because it was too boring.
Logged
Good people disobey bad laws.
PreparedSecurity.com - Modern security and preparedness for the 21st century.
 [img width= height= alt=Prepared Security]http://www.prepareddesign.com/uploads/4/4/3/6/4436847/1636340_orig.png[/img]

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #2 on: May 30, 2009, 02:26:04 PM »

bump
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

rabidfurby

  • Guest
Re: Programming Question - Regular Expressions
« Reply #3 on: May 30, 2009, 03:23:10 PM »

You probably have greedy matching somewhere you don't want it. ".*" gobbles up as much as it can until the next match. If you change it to ".*?", that will match as little as possible. It's hard to tell which match operator is causing the problem, so if you can't figure it out with this, try posting an example of the text that's causing a problem.

There are also some nifty websites that let you type in a regex and some text and they'll show you in real time what gets matched.
Logged

Alex Libman 14

  • Guest
Re: Programming Question - Regular Expressions
« Reply #4 on: May 30, 2009, 04:19:25 PM »


Perhaps I misunderstood, but why not just do 
Code: [Select]
my @URLs = ($HtmlString =~ /href\=\"([^\"]+)\"/gi);

  ?



Do you need to replace the whole 
Code: [Select]
<a ...> ... </a>

  tag block in the original HTML string with just the URL?
« Last Edit: May 30, 2009, 04:22:15 PM by Alex Libman 18 »
Logged

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #5 on: May 31, 2009, 01:45:09 PM »

No, what I need is a regular expression that could turn this post I am making right now... http://bbs.freetalklive.com/index.php?topic=29455.msg545420#msg545420

Into JUST one URL.... So if, for example... I posted a URL here to http://www.freetalklive.com it would be parsed out, because only the FIRST url would be included...

In other words... the regular expression would select the whole post, and erase everything but the first URL.  The only remaining text would be that first URL.

That first set of expressions works great if there is only one URL in the block of text (ANY block of text) but if there are two urls in the block like there are two urls in this post... It breaks.

« Last Edit: May 31, 2009, 01:48:53 PM by Johnson »
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

Alex Libman 14

  • Guest
Re: Programming Question - Regular Expressions
« Reply #6 on: May 31, 2009, 04:02:16 PM »

Then why not just set the $HtmlString to $1 or $URLs[0] after matching?


Or:   
Code: [Select]
$HtmlString =~ s/.*\<.*href\=\"([^\"]+)\".*\>.*/$1/sig;
« Last Edit: May 31, 2009, 04:13:39 PM by Alex Libman 19 »
Logged

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #7 on: June 01, 2009, 01:02:34 PM »

I'm hoping this way you phrased it will work...

This isn't for setting a variable... it's for a Yahoo Pipe (which is probably setting a variable in it's own way) but... *I* have to conform to the structure of their user input form...

http://pipes.yahoo.com/pipes/pipe.info?_id=c0616032c1f43f1a3a3d026d54419959

Currently, there is a disfunctional URL on display.... It's the title that starts with "Old RSS" and looks WACKY in proportion to the other stories in the feed.
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #8 on: June 01, 2009, 01:08:30 PM »

so anyway... the statements have to be formatted for the yahoo regex module which expects two fields...

field 1 is "replace:"
field 2 is "with:"

Hence why I have two statements rather than just being able to assign the whole shebang to a variable.
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

rabidfurby

  • Guest
Re: Programming Question - Regular Expressions
« Reply #9 on: June 01, 2009, 09:18:04 PM »

so anyway... the statements have to be formatted for the yahoo regex module which expects two fields...

field 1 is "replace:"
field 2 is "with:"

Hence why I have two statements rather than just being able to assign the whole shebang to a variable.

It sounds like you might need two regexen, then. One to replace .*?<a.*?href=" with nothing, the next to replace ">.* with nothing.

This is all assuming you can make the first run regex run completely before the second, though. Another reason why drag-and-drop "now anyone can program!" bullshit will never amount to anything.
Logged

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #10 on: June 01, 2009, 09:54:42 PM »

That's REALLY close to what I figured out... I see you have a second question mark.

I had:
.*<.*?href=" and ".*?>.*


whereas you suggest...
.*?<a.*?href=" and ">.*

They are very similar... but neither works.
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #11 on: June 01, 2009, 10:46:34 PM »

AHAHAHAH... I know why... omg... I'm such a douche...

OK.. so in the feed... the feed isn't pulling the ENTIRE body of an e-mail to put as the description... It's only pulling the first few lines (like a paragraph)...

So... my problem now is that I need a condition... to erase .* IF it doesn't match the patterns at all.

The way I had it set up... It replaces everything before and after a link... if it finds a link... but what about if there is no link... then it leaves it alone... that's no good!


Also... as far as the matching one time...
I fixed THAT part by doing this:

(.*<.*?href=")+?
and (".*?>.*)+?
 
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

blackie

  • Guest
Re: Programming Question - Regular Expressions
« Reply #12 on: June 01, 2009, 11:06:04 PM »

Logged

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #13 on: June 01, 2009, 11:37:38 PM »

Got it...

actually they have expressions run in sequential order...

and while I am not able to write if/then statements... I am able to use filters to run the data through different pipes, which essentially acts as an if switch...

also... they have loops... It's actually quite cool

Quote
This is all assuming you can make the first run regex run completely before the second, though. Another reason why drag-and-drop "now anyone can program!" bullshit will never amount to anything.

I think that view is a little myopic. It's not really drag and drop programming. It's more of a mapping system allowing a visual interface for dealing with complex sets of data from webpages and rss feeds...

seriously... take a look at the source for the FMD yahoo pipe
http://pipes.yahoo.com/pipes/pipe.info?_id=c0616032c1f43f1a3a3d026d54419959

I am doing a TON of stuff to manipulate the feed coming out of Google.
« Last Edit: June 01, 2009, 11:40:06 PM by Johnson »
Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand

Johnson

  • Tactless Skeptic
  • Administrator
  • *****
  • Offline Offline
  • Posts: 2914
    • View Profile
Re: Programming Question - Regular Expressions
« Reply #14 on: June 01, 2009, 11:38:08 PM »

Logged
"In silent resignation, one must never submit to them voluntarily, and even if one is imprisoned in some ghastly dictatorship's jail, where no action is possible - serenity comes from the knowledge that one does NOT accept it. To deal with men by force, is as impractical as to deal with nature by persuasion... Which is the policy of savages who rule men by force, and who plead with nature by prayers, incantations and bribes (sacrifies)." - Ayn Rand
Pages: [1]   Go Up
+  The Free Talk Live BBS
|-+  Free Talk Live
| |-+  General
| | |-+  Programming Question - Regular Expressions

// ]]>

Page created in 0.022 seconds with 31 queries.