GTK+ Forums

Discussion forum for GTK+ and Programming. Ask questions, troubleshoot problems, view and post example code, or express your opinions.
It is currently Mon Apr 21, 2014 4:38 am

All times are UTC




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: search google scholar
PostPosted: Sat Sep 22, 2012 5:29 am 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
I have seen one Qt based app (kbibtex) that can search google schoar.
Is it possible to achive the same result within gtk and other gnome libraries with c.
I tried lynx -dump and curl to search and get the result, to fail.

Code:
$ cat websearchgooglescholar.cpp
/***************************************************************************
*   Copyright (C) 2004-2010 by Thomas Fischer                             *
*   fischer@unix-ag.uni-kl.de                                             *
*                                                                         *
*   This program is free software; you can redistribute it and/or modify  *
*   it under the terms of the GNU General Public License as published by  *
*   the Free Software Foundation; either version 2 of the License, or     *
*   (at your option) any later version.                                   *
*                                                                         *
*   This program is distributed in the hope that it will be useful,       *
*   but WITHOUT ANY WARRANTY; without even the implied warranty of        *
*   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         *
*   GNU General Public License for more details.                          *
*                                                                         *
*   You should have received a copy of the GNU General Public License     *
*   along with this program; if not, write to the                         *
*   Free Software Foundation, Inc.,                                       *
*   59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.             *
***************************************************************************/

#include <QSpinBox>
#include <QLayout>
#include <QLabel>
#include <QFormLayout>
#include <QNetworkReply>
#include <QNetworkCookieJar>

#include <KLocale>
#include <KMessageBox>
#include <KDebug>
#include <KConfigGroup>
#include <KLineEdit>
#include <KIcon>

#include <fileimporterbibtex.h>
#include "websearchgooglescholar.h"


class WebSearchGoogleScholar::WebSearchGoogleScholarPrivate
{
private:
    WebSearchGoogleScholar *p;

public:
    int numResults;
    QStringList listBibTeXurls;
    QString queryFreetext, queryAuthor, queryYear;
    QString startPageUrl;
    QString advancedSearchPageUrl;
    QString configPageUrl;
    QString setConfigPageUrl;
    QString queryPageUrl;
    FileImporterBibTeX importer;
    int numSteps, curStep;

    WebSearchGoogleScholarPrivate(WebSearchGoogleScholar *parent)
            : p(parent) {
        startPageUrl = QLatin1String("http://scholar.google.com/");
        configPageUrl = QLatin1String("http://%1/scholar_preferences");
        setConfigPageUrl = QLatin1String("http://%1/scholar_setprefs");
        queryPageUrl = QLatin1String("http://%1/scholar");
    }
};

WebSearchGoogleScholar::WebSearchGoogleScholar(QWidget *parent)
        : WebSearchAbstract(parent), d(new WebSearchGoogleScholar::WebSearchGoogleScholarPrivate(this))
{
    // nothing
}

WebSearchGoogleScholar::~WebSearchGoogleScholar()
{
    delete d;
}

void WebSearchGoogleScholar::startSearch()
{
    m_hasBeenCanceled = false;
    emit stoppedSearch(resultNoError);
}

void WebSearchGoogleScholar::startSearch(const QMap<QString, QString> &query, int numResults)
{
    d->numResults = numResults;
    m_hasBeenCanceled = false;
    d->curStep = 0;
    d->numSteps = numResults + 4;

    QStringList queryFragments;
    foreach(QString queryFragment, splitRespectingQuotationMarks(query[queryKeyFreeText])) {
        queryFragments.append(encodeURL(queryFragment));
    }
    foreach(QString queryFragment, splitRespectingQuotationMarks(query[queryKeyTitle])) {
        queryFragments.append(encodeURL(queryFragment));
    }
    d->queryFreetext = queryFragments.join("+");
    queryFragments.clear();
    foreach(QString queryFragment, splitRespectingQuotationMarks(query[queryKeyAuthor])) {
        queryFragments.append(encodeURL(queryFragment));
    }
    d->queryAuthor = queryFragments.join("+");
    d->queryYear = encodeURL(query[queryKeyYear]);

    KUrl url(d->startPageUrl);
    QNetworkRequest request(url);
    setSuggestedHttpHeaders(request);
    QNetworkReply *reply = networkAccessManager()->get(request);
    setNetworkReplyTimeout(reply);
    connect(reply, SIGNAL(finished()), this, SLOT(doneFetchingStartPage()));

    emit progress(0, d->numSteps);
}

void WebSearchGoogleScholar::doneFetchingStartPage()
{
    emit progress(++d->curStep, d->numSteps);

    QNetworkReply *reply = static_cast<QNetworkReply*>(sender());

    if (handleErrors(reply)) {
        QMap<QString, QString> inputMap = formParameters(reply->readAll(), "<form ");
        inputMap["hl"] = "en";

        KUrl url(d->configPageUrl.arg(reply->url().host()));
        for (QMap<QString, QString>::ConstIterator it = inputMap.constBegin(); it != inputMap.constEnd(); ++it)
            url.addQueryItem(it.key(), it.value());

        QNetworkRequest request(url);
        setSuggestedHttpHeaders(request, reply);
        QNetworkReply *newReply = networkAccessManager()->get(request);
        setNetworkReplyTimeout(newReply);
        connect(newReply, SIGNAL(finished()), this, SLOT(doneFetchingConfigPage()));
    } else
        kDebug() << "url was" << reply->url().toString();
}

void WebSearchGoogleScholar::doneFetchingConfigPage()
{
    emit progress(++d->curStep, d->numSteps);

    QNetworkReply *reply = static_cast<QNetworkReply*>(sender());

    if (handleErrors(reply)) {
        QMap<QString, QString> inputMap = formParameters(reply->readAll(), "<form ");
        inputMap["hl"] = "en";
        inputMap["scis"] = "yes";
        inputMap["scisf"] = "4";
        inputMap["num"] = QString::number(d->numResults);

        KUrl url(d->setConfigPageUrl.arg(reply->url().host()));
        for (QMap<QString, QString>::ConstIterator it = inputMap.constBegin(); it != inputMap.constEnd(); ++it)
            url.addQueryItem(it.key(), it.value());

        QNetworkRequest request(url);
        setSuggestedHttpHeaders(request, reply);
        QNetworkReply *newReply = networkAccessManager()->get(request);
        setNetworkReplyTimeout(newReply);
        connect(newReply, SIGNAL(finished()), this, SLOT(doneFetchingSetConfigPage()));
    } else
        kDebug() << "url was" << reply->url().toString();
}

void WebSearchGoogleScholar::doneFetchingSetConfigPage()
{
    emit progress(++d->curStep, d->numSteps);

    QNetworkReply *reply = static_cast<QNetworkReply*>(sender());

    if (handleErrors(reply)) {
        QMap<QString, QString> inputMap = formParameters(reply->readAll(), "<form ");
        QStringList dummyArguments = QStringList() << "as_epq" << "as_oq" << "as_eq" << "as_occt" << "as_publication" << "as_sdtf";
        foreach(QString dummyArgument, dummyArguments) {
            inputMap[dummyArgument] = "";
        }
        inputMap["hl"] = "en";
        inputMap["num"] = QString::number(d->numResults);

        KUrl url(QString(d->queryPageUrl).arg(reply->url().host()));
        for (QMap<QString, QString>::ConstIterator it = inputMap.constBegin(); it != inputMap.constEnd(); ++it)
            url.addQueryItem(it.key(), it.value());
        url.addEncodedQueryItem(QString("as_q").toAscii(), d->queryFreetext.toAscii());
        url.addEncodedQueryItem(QString("as_sauthors").toAscii(), d->queryAuthor.toAscii());
        url.addEncodedQueryItem(QString("as_ylo").toAscii(), d->queryYear.toAscii());
        url.addEncodedQueryItem(QString("as_yhi").toAscii(), d->queryYear.toAscii());
        url.addQueryItem("btnG", "Search Scholar");

        QNetworkRequest request(url);
        setSuggestedHttpHeaders(request, reply);
        QNetworkReply *newReply = networkAccessManager()->get(request);
        setNetworkReplyTimeout(newReply);
        connect(newReply, SIGNAL(finished()), this, SLOT(doneFetchingQueryPage()));
    } else
        kDebug() << "url was" << reply->url().toString();
}

void WebSearchGoogleScholar::doneFetchingQueryPage()
{
    emit progress(++d->curStep, d->numSteps);

    QNetworkReply *reply = static_cast<QNetworkReply*>(sender());

    if (handleErrors(reply)) {
        QString htmlText = reply->readAll();

        QRegExp linkToBib("/scholar.bib\\?[^\" >]+");
        int pos = 0;
        d->listBibTeXurls.clear();
        while ((pos = linkToBib.indexIn(htmlText, pos)) != -1) {
            d->listBibTeXurls << "http://" + reply->url().host() + linkToBib.cap(0).replace("&amp;", "&");
            pos += linkToBib.matchedLength();
        }

        if (!d->listBibTeXurls.isEmpty()) {
            QNetworkRequest request(d->listBibTeXurls.first());
            setSuggestedHttpHeaders(request, reply);
            QNetworkReply *newReply = networkAccessManager()->get(request);
            setNetworkReplyTimeout(newReply);
            connect(newReply, SIGNAL(finished()), this, SLOT(doneFetchingBibTeX()));
            d->listBibTeXurls.removeFirst();
        } else {
            emit stoppedSearch(resultNoError);
            emit progress(d->numSteps, d->numSteps);
        }
    } else
        kDebug() << "url was" << reply->url().toString();
}

void WebSearchGoogleScholar::doneFetchingBibTeX()
{
    emit progress(++d->curStep, d->numSteps);

    QNetworkReply *reply = static_cast<QNetworkReply*>(sender());

    if (handleErrors(reply)) {
        QString rawText = reply->readAll();
        File *bibtexFile = d->importer.fromString(rawText);

        Entry *entry = NULL;
        if (bibtexFile != NULL) {
            for (File::ConstIterator it = bibtexFile->constBegin(); entry == NULL && it != bibtexFile->constEnd(); ++it) {
                entry = dynamic_cast<Entry*>(*it);
                if (entry != NULL) {
                    Value v;
                    v.append(new VerbatimText(label()));
                    entry->insert("x-fetchedfrom", v);
                    emit foundEntry(entry);
                }
            }
            delete bibtexFile;
        }

        if (entry == NULL) {
            kWarning() << "Searching" << label() << "resulted in invalid BibTeX data:" << QString(reply->readAll());
            emit stoppedSearch(resultUnspecifiedError);
            return;
        }

        if (!d->listBibTeXurls.isEmpty()) {
            QNetworkRequest request(d->listBibTeXurls.first());
            setSuggestedHttpHeaders(request, reply);
            QNetworkReply *newReply = networkAccessManager()->get(request);
            setNetworkReplyTimeout(newReply);
            connect(newReply, SIGNAL(finished()), this, SLOT(doneFetchingBibTeX()));
            d->listBibTeXurls.removeFirst();
        } else {
            emit stoppedSearch(resultNoError);
            emit progress(d->numSteps, d->numSteps);
        }
    } else
        kDebug() << "url was" << reply->url().toString();
}

QString WebSearchGoogleScholar::label() const
{
    return i18n("Google Scholar");
}

QString WebSearchGoogleScholar::favIconUrl() const
{
    return QLatin1String("http://scholar.google.com/favicon.ico");
}

WebSearchQueryFormAbstract* WebSearchGoogleScholar::customWidget(QWidget *)
{
    return NULL;
}

KUrl WebSearchGoogleScholar::homepage() const
{
    return KUrl("http://scholar.google.com/");
}

void WebSearchGoogleScholar::cancel()
{
    WebSearchAbstract::cancel();
}


Top
 Profile  
 
 Post subject: Re: search google scholar
PostPosted: Sat Sep 22, 2012 7:49 am 
Offline
Never Seen the Sunlight

Joined: Mon Apr 28, 2008 5:52 am
Posts: 709
Location: UK
Hi,

Pasting a large chunk of QT/KDE code on a GTK+ forum is not really going to get you much help. You would do better if you can describe in detail what you want to do rather than saying QT can do this and here is the code I want the same in GTK.

_________________
E.


Top
 Profile  
 
 Post subject: Re: search google scholar
PostPosted: Sat Sep 22, 2012 8:09 am 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
Hi errol and other,
yes, I was a little bit sceptic about posting that chunk. But I thought an example is better then 100 words. It was not about compare. I am extremely sorry if it sounds that way.
So, here is what I am looking for: I want to search google scholar(gs), say, I will put serach criteria in entry, and by some means, I want to call gs search result using those criteria.
Say, if I want to search gs for Albert Einstein+1905,

so, grom my gtk+c code, I would love to have the result of the search:
Quote:
http://scholar.google.co.uk/scholar?hl=en&q=albert+einstein%2B1905&btnG=&as_sdt=1%2C5&as_sdtp=


guess I have managed to make the thing clear.


Attachments:
gs.png
gs.png [ 33.48 KiB | Viewed 626 times ]
Top
 Profile  
 
 Post subject: Re: search google scholar
PostPosted: Sat Sep 22, 2012 4:23 pm 
Offline
Never Seen the Sunlight

Joined: Wed Jul 23, 2008 10:31 am
Posts: 2406
Location: Slovenia
Hi.

Like most Google's services, I guess Google Schoolar also has web-based API that can be used by 3rd party app developers in their app. What you need to have is a library that can do "web talk" for you and there are plenty available. Most GTK+ apps use libsoup, since this library is GObject-based and can be easily included in app.

Cheers,
Tadej


Top
 Profile  
 
 Post subject: Re: search google scholar
PostPosted: Sat Sep 22, 2012 4:50 pm 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
unfortunately google scholar has no api


Top
 Profile  
 
 Post subject: Re: search google scholar
PostPosted: Sat Sep 22, 2012 5:23 pm 
Offline
Never Seen the Sunlight

Joined: Wed Jul 23, 2008 10:31 am
Posts: 2406
Location: Slovenia
Hi.

Well, in this case, you'll need to construct the search URI yourself, send it to server and then scrape the resulting HTML (and looking a bit closer, this is what QT code is probably doing). Library requirements stay the same in both cases.

Cheers,
Tadej


Top
 Profile  
 
 Post subject: Re: search google scholar
PostPosted: Sat Sep 22, 2012 9:45 pm 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
If I am not asking too much a favour, is there anyone did this type of work(probably for other website scraping)? or is it possible to direct me to appropriate source?
In net, I failed to find "tutorials" on libsoup and HTML scraping.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group