Web Scraping

0 favourites
  • 5 posts
From the Asset Store
Create your quiz game! Create unlimited categories. Manage categories and questions
  • Curious if any of yall have experience with this?

    I'm using PHP to pull html data from a webpage, but have to manipulate the data pretty heavily to get what I want off the page. (regexmatchat and replace multiple times, etc)

    Any tips or ideas for simplification?

  • I am not super familiar with PHP, but if you need to do DOM parsing, you are better of using a library. Avoid rolling your own regular expressions like the plague, it will only cause you trouble in the future.

    Looking around I found this PHP DOM manipulation library github.com/ivopetkov/html5-dom-document-php/, have never used it, but it looks promising and has a bunch of stars in github. And it seems to have documentation, which is always a plus.

    Even if you are using a library, the problem with scraping is that it depends on the layout you are trying to get data from, so if that changes, you need to update your code. Because of that reason, in the past I found that it is better to keep your queries just specific enough to get what you want, in the case the layout changes a little bit, but not too much, you might get away without changing anything on your side.

    What does this has to do with C3?

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • The app I'm using was created in c3, which can do anything really. I parse everything via c3...just easier to manipulate inside. I have considered using php to get what I need but that's kinda out of my pay grade.

    I'm using the meta tags, and not raw data...which should be stable (as in not many changes since that would break all websites using the info.)

  • Have you tried writing a script in C3 and use DOMParser? You still need to do some coding, but it's all kept in C3.

    I'm sure extracting data from HTML like that will be much more convenient than working with regular expressions and raw text.

    Either way I think you are going to need some coding because C3 really doesn't have anything built for this specific task.

  • I've done some research on that, and frankly I'm lost as to how to implement DOM parse. I see "simple DOM" for PHP but seriously above my paygrade lol.

    I managed to get the following down from websites that have a widget added.

    Which works 98% but... occasionally has issues loading the image (I put a default image in place since some pages don't have an associated image or it fails to load).

    If you wouldn't mind expounding on the DOM, I'd appreciate the insight.

Jump to:
Active Users
There are 1 visitors browsing this topic (0 users and 1 guests)